Access v-table at run-time - c++

Is it possible to access a function's v-table at runtime? Can meta-information such as the number of different function versions be determined? This might be more of a theoretical question, but could a developer put a cap on the number of classes that can extend a given base class by making sure the v-table never exceeds a certain number of rows?

Is it possible to access a function's v-table at runtime? Can meta-information such as the number of different function versions be determined?
Not in a portable way. The standard does not even have the concept of virtual table, it is more of an implementation detail than a requirement, even if all implementations I know use vtables. In the general case there will not even be enough information available at runtime (i.e. the compiler does not need to store the number of entries in the vtable, as it sees the type and can count)
Could a developer put a cap on the number of classes that can extend a given base class by making sure the v-table never exceeds a certain number of rows?
Again no, but since this shows a misconception, it might be worth treating it apart. When a base class has any virtual functions the compiler (in all implementations that use vtables) will create the vtable and that table will have exactly 1 entry per virtual function in the base class (plus some additional data --typeinfo or pointer to it, offset to the beginning of the object or other implementation details). When a class extends that base class, it will not add new elements to that vtable, but rather create a separate vtable (or more, depending on the type hierarchy). If the derived function does not add any new virtual function, the vtable for the derived object will contain the exact number of elements that the original vtable had. That is, you can have a huge hierarchy of inheritances without that affecting the vtable layout at all. What will change are the typeinfo data stored and the pointers to each virtual function, that will refer to the final overrider

an meta-information such as the number of different function versions be determined?
No C++ doesn't support reflection. What you are trying to achieve is not possible in C++ AFAIK

Theoretically, yes, because it's stored in memory and you have access to it. In practice, there is no sane, portable way to do it, because the compiler is free to implement virtual functions in any way it wants, so you would have to dig through your compiler's source code to find out how/where to access the desired information and how to interpret it.

The only glimmer of hope I could imagine for your effort, is the handling of dynamic_cast. Each compiler, with corresponding library support, has some concept of traversing a hierarchy to achieve a dynamic cast. If you could hook into that traversal, you might then know something about how many levels of inheritance you are dealing with. That said, even if you made this work, it would be compiler-specific (as others have said) since such implementation is proprietary.

You can use the Debug Interface Access SDK or other debug support interfaces (gdb) for this sort of thing.
RTT data is more portable but may not have sufficient details for your project.
For your specific question on limiting the v-table and preventing it from being extended too far, you can try this method;
IDiaSymbol::get_classParente
Retrieves a reference to the class parent of the symbol.
HRESULT get_classParent (IDiaSymbol** pRetVal);
You can investigate all of the class related symbol types here, what you might want to do is enumerate all class types loaded, get_classParent recursively and keep a tally of all of the classes which extend your base.
Your class could also require symbols to be available on startup to help with enforcement.

Related

Does 'final' specifier add any overhead?

Does using specifier final on a class or on a function add any memory or cpu overhead, or is it used at compile time only?
And how does std::is_final recognise what is final?
It actually can reduce overhead. And in rare cases, increase it.
If you have a pointer to a final class A, any virtual method calls can be de-virtualized and called directly. Similarly, a call to a virtual final method can be de-virtualized. In addition, the inheritance tree of a final class is fixed, even if it contains virtual parent classes, so you can de-virtualize some parent access.
Each of these de-virtualizations reduce or eliminate the requirement that a run-time structure (the vtable) be queried.
There can be a slight downside. Some coding techniques rely on vtable access to avoid direct access to a symbol, and then do not export the symbol. Accessing a vtable can be done via convention (without symbols from a library, just the header file for the classes in question), while accessing a method directly involves linking against that symbol.
This breaks one form of dynamic C++ library linking (where you avoid linking more than a dll loading symbol and/or C linkage functions that return pointers, and classes are exported via their vtables).
It is also possible that if you link against a symbol in a dynamic library, the dynamic library symbol load could be more expensive than the vtable lookup. I have not experienced or profiled this, but I have seen it claimed. The benefits should, in general, outweigh such costs. Any such cost is a quality of implementation issue, as the cost is not mandated to occur because the method is final.
Finally, final inhibits the empty base optimization trick on classes, where someone knows your class has no state, and inherits from it to reduce the overhead of "storing" an instance of your class from 1 byte to 0 bytes. If your class is empty and contains no virtual methods/inheritance, don't use final to avoid this being blocked. There is no equivalent for final functions.
Other than the EBO optimization issue (which only occurs with empty types), any overhead from final comes from how other code interacts with it, and will be rare. Far more often it will make other code faster, as directly interacting with a method enables both a more direct call of the method, and can lead to knock-on optimizations (because the call can be more fully understood by the compiler).
Marking anything except an empty type as final when it is final is almost certainly harmless at run time. Doing so on classes with virtual functions and inheritance is likely to be beneficial at run time.
std::is_final and similar traits are almost all implemented via compiler built-in magic. A good number of the traits in std require such magic. See How to detect if a class is final in C++11? (thanks to #Csq for finding that)
No, it's only used at compile-time
Magic (see here for further info - thanks Csq for link)

Checking if a class has expected attributes

I am going to give some classes about C++ and data structures, and to check students' progress I'd like them to develop the structures I talk about. This is the common approach for data structures classes, I guess. But I want more, I want the students to have a quick feedback on what they are missing, so I developed several unit tests for the classes that check the behavior and give them instant results on what is wrong.
This has been working properly for the past two semesters, but I want a step further on that automatized correction. I've been studying how to check what are the internal components of a class, so I can know if someone has implemented correctly a tree with a node* root and size_t size and hasn't used additional not-necessary attributes, for instance.
I know that I can have a rough approximation of an object size with sizeof, but the results are not that precise. It frequently is different from what I expect, for example: I tested creating a class with a pointer (8 bytes) and an int (4 bytes), but the sizeof was 28. From what I learnt, probably this has something to do with virtual function table and other alignment stuff.
So, how far and further can I go analyzing if someone has coded a data structure the proper and expected manner? How can I check that someone just didn't #include <list> and created an adaptor (for this I know I can just strip the includes but anyway)?
Let's break this answer into two parts, we'll split on the return from is_standard_layout.
1. Virtual Classes
is_standard_layout will return false, meaning the class is virtual. virtual classes will contain all the members from their parents aside from just a virtual function pointer. You can find more info here. Basically, your best bet for finding the size of the members here is to do sizeof the class in question reduced by sizeof(void*) And that's the size of your virtual class's members.
2. Non-Virtual Classes
is_standard_layout will return true meaning this is not a virtual class. In this case we can use offsetof to find the first member variable past the header information. Finding the end of the object with the pointer to the object and sizeof will you to measure the distance to the point returned by offsetof.
Both of these methods should yield the size of the members in the classes. determining an allowable range for class size is a matter of preference. But placing the evaluation in a static_assert will allow you to also provide a compile time message indicating the reason for the assert.

How to correctly invoke member methods using the virtual table?

Short Version
I'm using the entries from the vtable of a specific object to invoke virtual methods inherited from an interface.
In the end I'm searching for a way to get the exact offset each address to a virtual method has in the vtable of a specific object.
Detailed Version
Disclaimer
I know that this topic is implementation dependant and that one should not try to do this manually because the compiler does the (correct) work and a vtable is not considered to be a standard (let alone the dataformat).
I hereby testify that I already read dozens of "Don't do it... just, don't do it!"'s and am clear about the outrageous consequences my actions could have.
Therefore (and to favor a contructive discussion) I'll be using g++ (4.x.x) for the compiler on a Linux x64 Platform as my reference. Any software compiled using the below presented code will use the same setup, so it should be platform-indepandant as far as this is concerned.
Okay, this whole issue of mine is completely experimental, I don't want to use it in production code but to educate myself and my fellow students (my professor asked me if I could write a quick paper about this topic).
What I'm trying to do is basically the automatic invocation of methods using offsets to determine which method to invoke.
The basic outline of the classes is as follows (this is a simplification but shows the composition of my current attempts):
class IMethods
{
virtual double action1(double) = 0;
virtual double action2(double) = 0;
};
As you can see just a class with pure virtual methods which share the same signature.
enum Actions
{
actionID1,
actionID2
};
The enum-items are used to invoke the appropriate method.
class MethodProcessor : public IMethods
{
public:
double action1(double);
double action2(double);
};
Purposely omitted ctor/dtor of above class.
We can safely assume that these are the only virtual methods inherited from the interface and that polymorphy is of no concern.
This concludes the basic outline. Now to the real topic:
Is there a safely way to get the mapping of addresses in the vtable to the inherited virtual methods?
What I'm trying to do is something like this:
MethodProcessor proc;
size_t * vTable = *(size_t**) &proc;
double ret = ((double(*)(MethodProcessor*,double))vTable[actionID2])(&proc, 3.14159265359);
This is working out fine and is invoking action2, but I'm assuming that the address pointing to action2 is equal to index 1 and this part bugs me: What if there is some sort of offset added into the vtable before the address of action1 is defined?
In a book about the data object model of C++ I read that normally the first address in the vtable is leading to the RTTI (runtime type information), which in return I couldn't confirm because vTable[0] is legitimately pointing to action1.
The compiler knows the exact index of every virtual method pointer because, yeah, the compiler is building them and replacing every member invocation of the virtual methods with augmented code which equals the one I used above - but knows the index to use. I, for once, am taking the educated guess that there is no offset before my defined virtual methods.
Can't I use some C++-Hack to let the compiler compute the correct index on compile (or even run)-time? I could then use this information to add some offset to my enum-items and woulnd't have to worry about casting wrong addresses...
The ABI used by Linux is known: you should look at the section on virtual function table layout of the Itanium ABI. The document also specifies the object layout to find the vtable. I don't know if the approach you outlined works, however.
Note, that despite pointing at the document, I do not recommend to use the information to play tricks based on it! You unfortunately didn't explain what your actual goal is but it seems that using a pointer to member of the irtual functions is a more reliable approach to run-time dispatch to a virtual function:
double (IMethods::*method)(double)
= flag? &IMethods:: action1: &INethods::actions2;
MethodProcessor mp;
double rc = (mp.*method)(3.14);
I have nearly the same problem, even worse, in my case, this is for procuction code... don't ask me why (just tell me: "Don't do it... just, don't do it", or at least use an existing dynamic compiler such as LLC...).
Of course, this "torture" could be smoothed if compiler designers were asked to follow some generic C++ ABI specifications (or at least specify their own).
Apparently, there is a growing consensus on complying with "Generic C++ ABI" (also often called "ITANIUM C++ ABI", mentioned by Dietmar Kühl).
Because you are using G++ and because this is for educationnal purposes, I suggest you have a look at what the "-fdump-class-hierarchy" g++ switch does (you will find vtable layouts); this is what I personally use to be sure that I'm not casting wrong addresses.
Note: using a x86 (ia32) MinGW g++-4.7.2 compiler,
In double MethodProcessor::action( double ), the hidden "(MethodProcessor*)this" argument will be passed in a register (%ecx).
Whereas in ((double(*)(MethodProcessor*,double)), the first explicit "this" argument will be passed on stack.

single virtual inheritance compiler optimization in c++?

If I have this situation in C++ project:
1 base class 'Base' containing only pure virtual functions
1 class 'Derived', which is the only class which inherits (public) from 'Base'
Will the compiler generate a VTABLE?
It seems there would be no need because the project only contains 1 class to which a Base* pointer could possibly point (Derived), so this could be resolved compile time for all cases.
This is interesting if you want to do dependency injection for unit testing but don't want to incur the VTABLE lookup costs in production code.
I don't have hard data, but I have good reasons to say no, it won't turn virtual calls into static ones.
Usually, the compiler only sees a single compilation unit. It cannot know there's only a single subclass, because five months later you may write another subclass, compile it, get some ancient object files from the backup and link them all together.
While link-time optimizations do see the whole picture, they usually work on a far lower-level representation of the program. Such representation allow e.g. inlining of static calls, but don't represent inheritance information (except perhaps as optional metadata) and already have the virtual calls and vtables spelt out explicitly. I know this is the case for Clang and IIRC gcc's whole-program optimizations also work on some low-level IR (GIMPLE?).
Also note that with dynamic loading, you can still add more subclasses long after compilation and LTO. You may not need it, but if I was a compiler writer, I'd be weary of adding an optimization that allows people royally breaking virtual calls in very specific, hard-to-track-down circumstances.
It's rarely worth the trouble - if you don't need virtual calls (e.g. because you know you won't need any more subclasses), don't make stuff virtual. Review your design. If you need some polymorphism but not the full power of virtual, the curiously recurring template pattern may help.
The compiler doesn't have to use a vtable based implementation of virtual function dispatch at all so the answer to your question will be specific to the implementation that you are using.
The vtable is usually not only used for virtual functions, but it is also used to identify the class type when you do some dynamic_cast or when the program accesses the type_info for the class.
If the compiler detects that no virtual functions ever need a dynamic dispatch and none of the other features are used, it just could remove the vtable pointer as an optimization.
Obviously the compiler writer hasn't found it worth the trouble of doing this. Probably because it wouldn't be used very often.

Inheritance in C++ internals

Can some one explain me how inheritance is implemented in C++ ?
Does the base class gets actually copied to that location or just refers to that location ?
What happens if a function in base class is overridden in derived class ? Does it replace it with the new function or copies it in other location in derived class memory ?
first of all you need to understand that C++ is quite different to e.g. Java, because there is no notion of a "Class" retained at runtime. All OO-features are compiled down to things which could also be achieved by plain C or assembler.
Having said this, what acutally happens is that the compiler generates kind-of a struct, whenever you use your class definition. And when you invoke a "method" on your object, actually the compiler just encodes a call to a function which resides somewhere in the generated executable.
Now, if your class inherits from another class, the compiler somehow includes the fields of the baseclass in the struct he uses for the derived class. E.g. it could place these fields at the front and place the fields corresponding to the derived class after that. Please note: you must not make any assumptions regarding the concrete memory layout the C++ compiler uses. If you do so, you're basically on your own and loose any portability.
How is the inheritance implemented? well, it depends!
if you use a normal function, then the compiler will use the concrete type he's figured out and just encode a jump to the right function.
if you use a virtual function, the compiler will generate a vtable and generate code to look up a function pointer from that vtable, depending on the run time type of the object
This distinction is very important in practice. Note, it is not true that inheritance is allways implemented through a vtable in C++ (this is a common gotcha). Only if you mark a certain member function as virtual (or have done so for the same member function in a baseclass), then you'll get a call which is directed at runtime to the right function. Because of this, a virtual function call is much slower than a non-virtual call (might be several hundered times)
Inheritance in C++ is often accomplished via the vtable. The linked Wikipedia article is a good starting point for your questions. If I went into more detail in this answer, it would essentially be a regurgitation of it.