Trying to understand dynamic binding and virtual functions - c++

Given the codes below:
class Base
{
public:
virtual void f()
{
std::cout << "virtual Base::f()\n";
}
};
class D1 : public Base
{
public:
virtual void f()
{
std::cout << "virtual D1::f()\n";
}
};
int main()
{
D1 d1;
Base *bp = &d1;
bp->f();
return 0;
}
The output was exactly what I had expected:
virtual D1::f()
Press <RETURN> to close this window...
But once I removed the virtual void f() from class Base, the compiler complained that:
error: 'class Base' has no member named 'f'
Can anyone tell me why compiler didn't generate codes such that it can bind virtual functions at rum time?

You are calling virtual member functions via a pointer to Base. That means that you can only call methods that exist in the Base class. You cannot simply add methods to a type dynamically.

Although a little late as an answer, straight quote from C++ Primer on how function calls are resolved in relation to inheritance. Your code fails on name lookup (step 2 below), which is done statically.
Understanding how function calls are resolved is crucial to
understanding inheritance in C++. Given the call p->mem() (or
obj.mem()), the following four steps happen:
First determine the static type of p (or obj). Because we’re calling
a member, that type must be a class type.
Look for mem in the class that corresponds to the static type of p (or obj). If mem is not found, look in the direct base class and
continue up the chain of classes until mem is found or the last class
is searched. If mem is not found in the class or its enclosing base
classes, then the call will not compile.
Once mem is found, do normal type checking (§6.1, p. 203) to see if this call is legal given the definition that was found.
Assuming the call is legal, the compiler generates code, which varies depending on whether the call is virtual or not:
– If mem is virtual and the call is made through a reference or
pointer, then the compiler generates code to determine at run time
which version to run based on the dynamic type of the object.
– Otherwise, if the function is nonvirtual, or if the call is on an
object (not a reference or pointer), the compiler generates a normal
function call.

Just use this main function:
int main()
{
D1 d1;
D1 *bp = &d1;
bp->f();
return 0;
}
This is because if the bp is a Base poiter, Name lookup will start from the class Base

Related

Confusion is memory allocation for pointers and virtual functions

I read about virtual functions but i am not able to clear the concept.
In the below mentioned example.We are creating a base pointer and assigning base object first and calling function is base class and later assigning derived object and calling its function. Since we have already mentioned which objects will be assigned does not compiler know which object function to call during compilation? I did not get why the decision will be delayed till run time. Am i missing something here.?
#include <iostream>
using std::cout;
using std::endl;
// Virtual function selection
class Base
{
public:
virtual void print() const
{
cout << "Inside Base" << endl;
}
};
class Derived : public Base
{
public:
// virtual as well
void print() const
{
cout << "Inside Derived" << endl;
}
};
int main()
{
Base b;
Derived f;
Base* pb = &b; // points at a Base object
pb->print(); // call Base::print()
pb = &f; // points at Derived object
pb->print(); // call Derived::print()
}
In your particular case, the compiler could potentially figure out the type of the objects being pointer at by the base class pointer. But the virtual dispatch mechanism is designed for situations in which you do not have this information at compile time. For example,
int n;
std::cin >> n;
Base b;
Derived d;
Base* pb = n == 42 ? &b : &d;
Here, the choice is made based on user input. The compiler cannot know what pb will point to.
Since we have already mentioned which objects will be assigned does not compiler know which object function to call during compilation? I did not get why the decision will be delayed till run time.
In this very specific, contrived case, your compiler can optimise out all the polymorphism, yes.
Am i missing something here.?
The imagination to realise that the vast majority of code in real life is not this simple. There are infinitely many C++ programs for which the compiler does not have enough information to perform this optimisation.
As per my understanding, the compiler will just look at the reference type at compile time and bind the function defined and declared in that class. Since the Derived -> print() should be called you have to make the print function virtual in the base class so that the compiler will delay the binding to run time and use the function defined in the derived class.
Due to the fact that it is virtual, it is able to dynamically bind the function to the correct object. This means that the pointer calling the function will call the referenced object's function.

Virtual function hiding in derived class

I have two classes related by inheritance:-
class Base
{
public:
virtual void f(int x)
{
cout << "BASE::int" << endl;
}
virtual void f(double x)
{
cout << "BASE::double" << endl;
}
};
class Derived : public Base
{
public:
virtual void f(str::string s)
{
cout << "DERIVED::string" << endl;
}
};
I have provided same method in derived class with different parameters. That means rather than overriding I am hiding base class versions of this function. So, below calls are expected and clear to me.
std::string str("Hello");
Base b;
b.f(1); //calls base class version.
b.f(str); //error.
Derived d;
d.f(1); //error.
d.f(str); //calls derived class version.
But I am not able get clarification for this last scenario.
Base *b = new Derived;
b->f(str); //results in error.
Would compiler not bind this call to derived version of f using vtables and vptrs. But instead it's doing something else. Can anyone provide me complete path how compiler would try to resolve this call as per language mechanisms.
If your pointer is of type Base* then you can only "see" members that are defined in class Base. The compiler doesn't (or pretends not to) "know" that the variable really points to an instance of Derived, even if you just assigned one to it on the previous line.
When you declare a variable to be of type Base*, you're telling the compiler: treat this as something that could point to a Base or to any class derived from it. So you can't access members that are defined in a particular derived class, because there's no guarantee that the pointer actually points to an instance of that derived class.
The vtable only enters the picture at runtime. The generated assembly would have a lookup of the vptr value for a function and a jump to that address. This also means that the polymorphism is "restricted" to functions that Base knows about. Note that this is what makes more sense as well - the definition of a class should only depend on itself and its parents. If you wanted to make Base* b aware of the virtual functions implemented by Derived, you would end up with the number of vtable entries in Bases depending on its children.

How do upcasting and vtables work together to ensure correct dynamic binding?

So, vtable is a table maintained by the compiler which contains function pointers that point to the virtual functions in that class.
and
Assigning a derived class's object to an ancestor class's object is called up-casting.
Up-casting is handling a derived class instance/object using a base class pointer or reference; the objects are not "assigned to", which implies an overwriting of value ala operator= invocation.
(Thanks to: Tony D)
Now, how it is known at run time "which" class's virtual function is supposed to be called?
Which entry in vtable refers to the function of "particular" derived classes which is supposed to be called at run time?
You can imagine (although the C++ specification doesn't say this) that the vtable is an identifier (or some other metadata that can be used to "find more information" about the class itself) and a list of functions.
So, if we have a class like this:
class Base
{
public:
virtual void func1();
virtual void func2(int x);
virtual std::string func3();
virtual ~Base();
... some other stuff we don't care about ...
};
The compiler will then produce a VTable something like this:
struct VTable_Base
{
int identifier;
void (*func1)(Base* this);
void (*func2)(Base* this, int x);
std::string (*func3)(Base* this);
~Base(Base *this);
};
The compiler will then create an internal structure that, something like this (this is not possible to compile as C++, it's just to show what the compiler actually does - and I call it Sbase to differntiate the actual class Base)
struct SBase
{
VTable_Base* vtable;
inline void func1(Base* this) { vtable->func1(this); }
inline void func2(Base* this, int x) { vtable->func2(this, x); }
inline std::string func3(Base* this) { return vtable->func3(this); }
inline ~Base(Base* this) { vtable->~Base(this); }
};
It also builds the real vtable:
VTable_Base vtable_base =
{
1234567, &Base::func1, &Base::func2, &Base::func3, &Base::~Base
};
And in the constructor for Base, it will set the vtable = vtable_base;.
When we then add a derived class, where we override one function (and by default, the destructor, even if we don't declare one) :
class Derived : public Base
{
virtual void func2(int x) override;
};
The compiler will now make this structure:
struct VTable_Derived
{
int identifier;
void (*func1)(Base* this);
void (*func2)(Base* this, int x);
std::string (*func3)(Base* this);
~Base(Derived *this);
};
and then does the same "structure" building:
struct SDerived
{
VTable_Derived* vtable;
inline void func1(Base* this) { vtable->func1(this); }
inline void func2(Base* this, int x) { vtable->func2(this, x); }
inline std::string func3(Base* this) { return vtable->func3(this); }
inline ~Derived(Derived* this) { vtable->~Derived(this); }
};
We need this structure for when we are using Derived directly rather than through the Base class.
(We rely on the compiler chainin the ~Derived to call ~Base too, just like normal destructors that inherit)
And finally, we build an actual vtable:
VTable_Derived vtable_derived =
{
7654339, &Base::func1, &Derived::func2, &Base::func3, &Derived::~Derived
};
And again,the Derived constructor will set Dervied::vtable = vtable_derived for all instances.
Edit to answer question in comments: The compiler has to carefully place the various components in both VTable_Derived and SDerived such that it matches VTable_Base and SBase, so that when we have a pointer to Base, the Base::vtable and Base::funcN() are matching Derived::vtable and Derived::FuncN. If that doesn't match up, then the inheritance won't work.
If new virtual functions are added to Derived, they must then be placed after the ones inherited from Base.
End Edit.
So, when we do:
Base* p = new Derived;
p->func2();
the code will look up SBase::Func2, which will use the correct Derived::func2 (because the actual vtable inside p->vtable is VTable_Derived (as set by the Derived constructor that is called in conjunction with the new Derived).
I'll take a different route from the other answers and try to fill just the specific gaps in your knowledge, without going very much into the details. I'll address the mechanics just enough to help your understanding.
So, vtable is a table maintained by the compiler which contains function pointers that point to the virtual functions in that class.
The more precise way to say this is as follows:
Every class with virtual methods, including every class that inherits from a class with virtual methods, has its own virtual table. The virtual table of a class points to the virtual methods specific to that class, i.e. either inherited methods, overridden methods or newly added methods. Every instance of such a class contains a pointer to the virtual table that matches the class.
Up-casting is handling a derived class instance/object using a base class pointer or reference; (...)
Perhaps more enlightening:
Up-casting means that a pointer or reference to an instance of class Derived is treated as if it were a pointer or reference to an instance of class Base. The instance itself, however, is still purely an instance of Derived.
(When a pointer is "treated as a pointer to Base", that means that the compiler generates code for dealing with a pointer to Base. In other words, the compiler and the generated code know no better than that they are dealing with a pointer to Base. Hence, a pointer that is "treated as" will have to point to an object that offers at least the same interface as instances of Base. This happens to be the case for Derived because of inheritance. We'll see how this works out below.)
At this point we can answer the first version of your question.
Now, how it is known at run time "which" class's virtual function is supposed to be called?
Suppose we have a pointer to an instance of Derived. First we upcast it, so it is treated as a pointer to an instance of Base. Then we call a virtual method upon our upcasted pointer. Since the compiler knows that the method is virtual, it knows to look for the virtual table pointer in the instance. While we are treating the pointer as if it points to an instance of Base, the actual object has not changed value and the virtual table pointer within it is still pointing to the virtual table of Derived. So at runtime, the address of the method is taken from the virtual table of Derived.
Now, the particular method may be inherited from Base or it might be overridden in Derived. It does not matter; if inherited, the method pointer in the virtual table of Derived simply contains the same address as the corresponding method pointer in the virtual table of Base. In other words, both tables are pointing to the same method implementation for that particular method. If overridden, the method pointer in the virtual table of Derived differs from the corresponding method pointer in the virtual table of Base, so method lookups on instances of Derived will find the overridden method while lookups on instances of Base will find the original version of the method — regardless of whether a pointer to the instance is treated as a pointer to Base or a pointer to Derived.
Finally, it should now be straightforward to explain why the second version of your question is a bit misguided:
Which entry in vtable refers to the function of "particular" derived classes which is supposed to be called at run time?
This question presupposes that vtable lookups are first by method and then by class. It is the other way round: first, the vtable pointer in the instance is used to find the vtable for the right class. Then, the vtable for that class is used to find the right method.
Which entry in vtable refers to the function of "particular" derived
classes which is supposed to be called at run time?
None, it is not an entry in the vtable, but the vtable pointer that is part of each and every object instance that determines which are the correct set of virtual functions for that particular object. This way, depending on the actual vtable pointed to, invoking the "first virtual method" from the vtable may result in the calling of different functions for objects of different types in the same polymorphic hierarchy.
Implementations may vary, but what I personally consider the most logical and performing thing to do is to have the vtable pointer being the first element in the class layout. This way you can dereference the very address of the object to determine its type based on the value of the pointer sitting in that address, since all objects of a given type will have that pointer pointing to the same vtable, which is created uniquely for every object that has virtual methods, which is required to enable features as overriding certain virtual methods.
How do upcasting and vtables work together to ensure correct dynamic
binding?
Upcasting itself isn't strictly needed, neither is downcasting. Remember that you already have the object allocated in memory, and it will already have its vtable pointer set to the correct vtable for that type which is what ensures it, up an down casting doesn't change the vtable for that object, it only changes the pointer you operate through.
Downcasting is needed when you want to access functionality that is not available in the base class and is declared in the derived class. But before you try to do that, you must be sure that particular object is of or inherits the type which declares that functionality, which is where dynamic_cast comes in, when you dynamic cast the compiler generates a check for that vtable entry and whether it inherits the requested type from another table, generated at compile time, and if so the dynamic cast succeeds, otherwise it fails.
The pointer you access the object through doesn't refer to the right set of virtual functions to call, it merely serves as a gauge to which functions in the vtable you can refer to as the developer. That is why it is safe to upcast using a C style or static cast, which performs no runtime checks, because then you only limit your gauge to the functions available in the base class, which are already available in the derived class, so there is no room for error and harm. And that's why you must always use a dynamic cast or some other custom technique still based on virtual dispatch when you downcast, because you have to be sure that object's associated vtable does indeed contain the extra functionality you may invoke.
Otherwise you will get undefined behavior, and of the "bad kind" at that, meaning something fatal will most likely happen, since interpreting arbitrary data as an address of a function of particular signature to be called is a very big no-no.
Also note that in a static context, i.e. when it is known at compile time what the type is, the compiler will most likely not use the vtable to call virtual functions but use direct static calls or even inline certain functions, which will make them that much faster. In such cases upcasting and using a base class pointer instead of the actual object will only diminish that optimization.
Polymorphism and Dynamic Dispatch (hyper-abridged version)
Note: I was not able to fit enough information about multiple inheritance with virtual bases, as there is not much of anything simple about it, and the details would clutter the exposition (further). This answer demonstrates the mechanisms used to implement dynamic dispatch assuming only single inheritance.
Interpreting abstract types and their behaviors visible across module boundaries requires a common Application Binary Interface (ABI). The C++ standard, of course, does not require the implementation of any particular ABI.
An ABI would describe:
The layout of virtual method dispatch tables (vtables)
The metadata required for runtime type checks and cast operations
Name decoration (a.k.a. mangling), calling conventions, and many other things.
Both modules in the following example, external.so and main.o, are assumed to have been linked to the same runtime. Static and dynamic binding give preference to symbols located within the calling module.
An external library
external.h (distributed to users):
class Base
{
__vfptr_t __vfptr; // For exposition
public:
__attribute__((dllimport)) virtual int Helpful();
__attribute__((dllimport)) virtual ~Base();
};
class Derived : public Base
{
public:
__attribute__((dllimport)) virtual int Helpful() override;
~Derived()
{
// Visible destructor logic here.
// Note: This is in the header!
// __vft#Base gets treated like any other imported symbol:
// The address is resolved at load time.
//
this->__vfptr = &__vft#Base;
static_cast<Base *>(this)->~Base();
}
};
__attribute__((dllimport)) Derived *ReticulateSplines();
external.cpp:
#include "external.h" // the version in which the attributes are dllexport
__attribute__((dllexport)) int Base::Helpful()
{
return 47;
}
__attribute__((dllexport)) Base::~Base()
{
}
__attribute__((dllexport)) int Derived::Helpful()
{
return 4449;
}
__attribute__((dllexport)) Derived *ReticulateSplines()
{
return new Derived(); // __vfptr = &__vft#Derived in external.so
}
external.so (not a real binary layout):
__vft#Base:
[offset to __type_info#Base] <-- in external.so
[offset to Base::~Base] <------- in external.so
[offset to Base::Helpful] <----- in external.so
__vft#Derived:
[offset to __type_info#Derived] <-- in external.so
[offset to Derived::~Derived] <---- in external.so
[offset to Derived::Helpful] <----- in external.so
Etc...
__type_info#Base:
[null base offset field]
[offset to mangled name]
__type_info#Derived:
[offset to __type_info#Base]
[offset to mangled name]
Etc...
An application using the external library
special.hpp:
#include <iostream>
#include "external.h"
class Special : public Base
{
public:
int Helpful() override
{
return 55;
}
virtual void NotHelpful()
{
throw std::exception{"derp"};
}
};
class MoreDerived : public Derived
{
public:
int Helpful() override
{
return 21;
}
~MoreDerived()
{
// Visible destructor logic here
this->__vfptr = &__vft#Derived; // <- the version in main.o
static_cast<Derived *>(this)->~Derived();
}
};
class Related : public Base
{
public:
virtual void AlsoHelpful() = 0;
};
class RelatedImpl : public Related
{
public:
void AlsoHelpful() override
{
using namespace std;
cout << "The time for action... Is now!" << endl;
}
};
main.cpp:
#include "special.hpp"
int main(int argc, char **argv)
{
Base *ptr = new Base(); // ptr->__vfptr = &__vft#Base (in external.so)
auto r = ptr->Helpful(); // calls "Base::Helpful" in external.so
// r = 47
delete ptr; // calls "Base::~Base" in external.so
ptr = new Derived(); // ptr->__vfptr = &__vft#Derived (in main.o)
r = ptr->Helpful(); // calls "Derived::Helpful" in external.so
// r = 4449
delete ptr; // calls "Derived::~Derived" in main.o
ptr = ReticulateSplines(); // ptr->__vfptr = &__vft#Derived (in external.so)
r = ptr->Helpful(); // calls "Derived::Helpful" in external.so
// r = 4449
delete ptr; // calls "Derived::~Derived" in external.so
ptr = new Special(); // ptr->__vfptr = &__vft#Special (in main.o)
r = ptr->Helpful(); // calls "Special::Helpful" in main.o
// r = 55
delete ptr; // calls "Base::~Base" in external.so
ptr = new MoreDerived(); // ptr->__vfptr = & __vft#MoreDerived (in main.o)
r = ptr->Helpful(); // calls "MoreDerived::Helpful" in main.o
// r = 21
delete ptr; // calls "MoreDerived::~MoreDerived" in main.o
return 0;
}
main.o:
__vft#Derived:
[offset to __type_info#Derivd] <-- in main.o
[offset to Derived::~Derived] <--- in main.o
[offset to Derived::Helpful] <---- stub that jumps to import table
__vft#Special:
[offset to __type_info#Special] <-- in main.o
[offset to Base::~Base] <---------- stub that jumps to import table
[offset to Special::Helpful] <----- in main.o
[offset to Special::NotHelpful] <-- in main.o
__vft#MoreDerived:
[offset to __type_info#MoreDerived] <---- in main.o
[offset to MoreDerived::~MoreDerived] <-- in main.o
[offset to MoreDerived::Helpful] <------- in main.o
__vft#Related:
[offset to __type_info#Related] <------ in main.o
[offset to Base::~Base] <-------------- stub that jumps to import table
[offset to Base::Helpful] <------------ stub that jumps to import table
[offset to Related::AlsoHelpful] <----- stub that throws PV exception
__vft#RelatedImpl:
[offset to __type_info#RelatedImpl] <--- in main.o
[offset to Base::~Base] <--------------- stub that jumps to import table
[offset to Base::Helpful] <------------- stub that jumps to import table
[offset to RelatedImpl::AlsoHelpful] <-- in main.o
Etc...
__type_info#Base:
[null base offset field]
[offset to mangled name]
__type_info#Derived:
[offset to __type_info#Base]
[offset to mangled name]
__type_info#Special:
[offset to __type_info#Base]
[offset to mangled name]
__type_info#MoreDerived:
[offset to __type_info#Derived]
[offset to mangled name]
__type_info#Related:
[offset to __type_info#Base]
[offset to mangled name]
__type_info#RelatedImpl:
[offset to __type_info#Related]
[offset to mangled name]
Etc...
Invocation is (or might not be) Magic!
Depending on the method and what can be proven at the binding side, a virtual method call may be bound statically or dynamically.
A dynamic virtual method call will read the target function's address from the vtable pointed to by a __vfptr member.
The ABI describes how functions are ordered in vtables. For example: They might be ordered by class, then lexicographically by mangled name (which includes information about const-ness, parameters, etc...). For single inheritance, this approach guarantees that a function's virtual dispatch index will always be the same, regardless of how many distinct implementations there are.
In the examples given here, destructors are placed at the beginning of each vtable, if applicable. If the destructor is trivial and non-virtual (not defined or does nothing), the compiler may elide it entirely, and not allocate a vtable entry for it.
Base *ptr = new Special{};
MoreDerived *md_ptr = new MoreDerived{};
// The cast below is checked statically, which would
// be a problem if "ptr" weren't pointing to a Special.
//
Special *sptr = static_cast<Special *>(ptr);
// In this case, it is possible to
// prove that "ptr" could point only to
// a Special, binding statically.
//
ptr->Helpful();
// Due to the cast above, a compiler might not
// care to prove that the pointed-to type
// cannot be anything but a Special.
//
// The call below might proceed as follows:
//
// reg = sptr->__vptr[__index_of#Base::Helpful] = &Special::Helpful in main.o
//
// push sptr
// call reg
// pop
//
// This will indirectly call Special::Helpful.
//
sptr->Helpful();
// No cast required: LSP is satisfied.
ptr = md_ptr;
// Once again:
//
// reg = ptr->__vfptr[__index_of#Base::Helpful] = &MoreDerived::Helpful in main.o
//
// push ptr
// call reg
// pop
//
// This will indirectly call MoreDerived::Helpful
//
ptr->Helpful();
The logic above is the same for any invocation site that requires dynamic binding. In the example above, it doesn't matter exactly what type ptr or sptr point to; the code will just load a pointer at a known offset, then blindly call it.
Type casting: Ups and Downs
All information about a type hierarchy must be available to the compiler when translating a cast or function call expression. Symbolically, casting is just a matter of traversing a directed graph.
Up-casting in this simple ABI can be performed entirely at compile time. The compiler needs only to examine the type hierarchy to determine if the source and target types are related (there is a path from the source to the target in the type graph). By the substitution principle, a pointer to a MoreDerived also points to a Base and can be interpreted as such. The __vfptr member is at the same offset for all types in this hierarchy, so RTTI logic doesn't need to handle any special cases (in certain implementations of VMI, it would need to grab another offset from a type thunk to grab another vptr and so on...).
Down-casting, however, is different. Since casting from a base type to a derived type involves determining if the pointed-to object has a compatible binary layout, it is necessary to perform an explicit type check (conceptually, this is "proving" that the extra information exists beyond the end of the structure assumed at compile time).
Note that there are multiple vtable instances for the Derived type: One in external.so and one in main.o. This is because a virtual method defined for Derived (its destructor) appears in every translation unit that includes external.h.
Even though the logic is identical in both cases, both images in this example need to have their own copy. This is why type checking cannot be performed using addresses alone.
A down-cast is then performed by walking a type graph (copied in both images) starting from the source type decoded at runtime, comparing mangled names until the compile-time target is matched.
For example:
Base *ptr = new MoreDerived();
// ptr->__vfptr = &__vft::MoreDerived in main.o
//
// This provides the code below with a starting point
// for dynamic cast graph traversals.
// All searches start with the type graph in the current image,
// then all other linked images, and so on...
// This example is not exhaustive!
// Starts by grabbing &__type_info#MoreDerived
// using the offset within __vft#MoreDerived resolved
// at load time.
//
// This is similar to a virtual method call: Just grab
// a pointer from a known offset within the table.
//
// Search path:
// __type_info#MoreDerived (match!)
//
auto *md_ptr = dynamic_cast<MoreDerived *>(ptr);
// Search path:
// __type_info#MoreDerived ->
// __type_info#Derived (match!)
//
auto *d_ptr = dynamic_cast<Derived *>(ptr);
// Search path:
// __type_info#MoreDerived ->
// __type_info#Derived ->
// __type_info#Base (no match)
//
// Did not find a path connecting RelatedImpl to MoreDerived.
//
// rptr will be nullptr
//
auto *rptr = dynamic_cast<RelatedImpl *>(ptr);
At no point in the code above did ptr->__vfptr need to change. The static nature of type deduction in C++ requires the implementation to satisfy the substitution principle at compile time, meaning that the actual type of an object cannot change at runtime.
Summary
I've understood this question as one about the mechanisms behind dynamic dispatch.
To me, "Which entry in vtable refers to the function of "particular" derived classes which is supposed to be called at run time?", is asking how a vtable works.
This answer is intended to demonstrate that type casting affects only the view of an object's data, and that the implementation of dynamic dispatch in these examples operate independently of it. However, type casting does affect dynamic dispatch in the case of multiple inheritance, where determining which vtable to use may require multiple steps (an instance of a type with multiple bases may have multiple vptrs).
casting
casting is a concept associated with variable. So any variable can be casted. It can be casted up or down.
char charVariable = 'A';
int intVariable = charVariable; // upcasting
int intVariable = 20;
char charVariale = intVariable; // downcasting
for system defined data type Up cast or downcast is based on your current variable and it mainly related to how much memory compiler is allocating to both compared variable.
If you are assigning a variable which is allocating less memory than the type what is converting to, is called up cast.
If you are assigning a variable which is allocating more memory than the type what is converting to, is called down cast.
Down cast create some problem when the value is trying to cast can't fit in to that allocated memory area.
Upcasting in Class level
Just like system defined data type we can have object of base class and derived class. So if we want to convert derived type to base type , it is known as down upcasting. That can be achieved by pointer of a base class pointing to a derived class type.
class Base{
public:
void display(){
cout<<"Inside Base::display()"<<endl;
}
};
class Derived:public Base{
public:
void display(){
cout<<"Inside Derived::display()"<<endl;
}
};
int main(){
Base *baseTypePointer = new Derived(); // Upcasting
baseTypePointer.display(); // because we have upcasted we want the out put as Derived::display() as output
}
output
Inside Base::display()
Excepted
Inside Derived::display()
In the above scenario the output wasn't as excepted. Its because we don't have the v-table and vptr (virtual pointer) in the object the base pointer will call the Base::display() though we have assigned derived type to the base pointer.
To avoid this problem c++ gives us virtual concept. Now the base class display function need to be changed to a virtual type.
virtual void display()
full code is:
class Base{
public:
virtual void display(){
cout<<"Inside Base::display()"<<endl;
}
};
class Derived:public Base{
public:
void display(){
cout<<"Inside Derived::display()"<<endl;
}
};
int main(){
Base *baseTypePointer = new Derived(); // Upcasting
baseTypePointer.display(); // because we have upcasted we want the out put as Derived::display() as output
}
output
Inside Derived::display()
Excepted
Inside Derived::display()
To understand this we need to understand v-table and vptr;
when ever compiler find a virtual along with a function it will generate a virtual table for each of the classes (both Base and all the derived classes).
If virtual function is present than every object will be containing vptr (virtual pointer) pointing to the respective class vtable and vtable will contain the pointer to the respective class virtual function. when you will call the function throught vptr the virutal function will get called and it will invoke the respective class function and we will achieve the required output.
I believe, this is best explained by implementing polymorphism in C. Given these two C++ classes:
class Foo {
virtual void foo(int);
};
class Bar : public Foo {
virtual void foo(int);
virtual void bar(double);
};
the C structure definitions (i. e. the header file) would look like this:
//For class Foo
typedef struct Foo_vtable {
void (*foo)(int);
} Foo_vtable;
typedef struct Foo {
Foo_vtable* vtable;
} Foo;
//For class Bar
typedef struct Bar_vtable {
Foo_vtable super;
void (*bar)(double);
}
typedef struct Bar {
Foo super;
} Bar;
As you see, there are two structure definitions for each class, one for the vtable and one for the class itself. Note also that both structures for class Bar include a base class object as their first member which allows us upcasting: both (Foo*)myBarPointer and (Foo_vtable*)myBar_vtablePointer are valid. As such, given a Foo*, it is safe to find the location of the foo() member by doing
Foo* basePointer = ...;
(basePointer->vtable->foo)(7);
Now, lets take a look at how we can actually fill the vtables. For that we write some constructors that use some statically defined vtable instances, this is what the foo.c file could look like
#include "..."
static void foo(int) {
printf("Foo::foo() called\n");
}
Foo_vtable vtable = {
.foo = &foo,
};
void Foo_construct(Foo* me) {
me->vtable = vtable;
}
This makes sure that it is possible to execute (basePointer->vtable->foo)(7) on every object that has been passed to Foo_construct(). Now, the code for Bar is quite similar:
#include "..."
static void foo(int) {
printf("Bar::foo() called\n");
}
static void bar(double) {
printf("Bar::bar() called\n");
}
Bar_vtable vtable = {
.super = {
.foo = &foo
},
.bar = &bar
};
void Bar_construct(Bar* me) {
Foo_construct(&me->super); //construct the base class.
(me->vtable->foo)(7); //This will print Foo::foo()
me->vtable = vtable;
(me->vtable->foo)(7); //This will print Bar::foo()
}
I have used static declarations for the member functions to avoid having to invent a new name for each implementation, static void foo(int) restricts the visibility of the function to the source file. However, it can still be called from other files by the use of a function pointer.
Usage of these classes could look like this:
#include "..."
int main() {
//First construct two objects.
Foo myFoo;
Foo_construct(&myFoo);
Bar myBar;
Bar_construct(&myBar);
//Now make some pointers.
Foo* pointer1 = &myFoo, pointer2 = (Foo*)&myBar;
Bar* pointer3 = &myBar;
//And the calls:
(pointer1->vtable->foo)(7); //prints Foo::foo()
(pointer2->vtable->foo)(7); //prints Bar::foo()
(pointer3->vtable->foo)(7); //prints Bar::foo()
(pointer3->vtable->bar)(7.0); //prints Bar::bar()
}
Once you know how this works, you know how C++ vtables work. The only difference is that in C++ the compiler does the work that I did myself in the code above.
Let me try to explain it with some examples:-
class Base
{
public:
virtual void function1() {cout<<"Base :: function1()\n";};
virtual void function2() {cout<<"Base :: function2()\n";};
virtual ~Base(){};
};
class D1: public Base
{
public:
~D1(){};
virtual void function1() { cout<<"D1 :: function1()\n";};
};
class D2: public Base
{
public:
~D2(){};
virtual void function2() { cout<< "D2 :: function2\n";};
};
So, compiler would generate three vtables one for each class as these classes have virtual functions. ( Although it's compiler-dependant ).
NOTE:- vtables contain only pointers to virtual functions. Non-virtual functions would still be resolved at compile time...
You are right in saying that vtables are nothing just pointers to functions. vtables for these classes would be like something:-
vtable for Base:-
&Base::function1 ();
&Base::function2 ();
&Base::~Base ();
vtable for D1:-
&D1::function1 ();
&Base::function2 ();
&D1::~D1();
vtable for D2:-
&Base::function1 ();
&D2::function2 ();
&D2::~D2 ();
vptr is a pointer which is used for look-up purpose on this table. Each object of polymorphic class has extra allocated space for vptr in it ( Although where vptr would be in object is entirely implementation dependant ).Generally vptr is at the beginning of object.
With taking all into account , if I make a call to func, compiler at run time would check what b is actually pointing to:-
void func ( Base* b )
{
b->function1 ();
b->function2 ();
}
Let's say we have object of D1 passed to func. Compiler would resolve calls in following manner:-
First it would fetch vptr from object and then it will use it to get correct address of function to call. SO, in this case vptr would give access to D1's vtable. and when it looksup for function1 it would get the address of function1 defined in base class. In case of call to function2, it would get address of base's function2.
Hope I have clarified your doubts to your satisfaction...
The implementation is compiler specific. Here I am going to do some thoughts that have NOTHING TO DO WITH ANY ACTUAL KNOWLEDGE of how exactly it is done in compilers, but just with some minimal requirements that are needed in order to work as required. Keep in mind that each instance of a class with virtual methods knows at run time which is the class it belongs too.
Lets suppose we have a chain of base and derived classes with a length of 10 ( so a derived class has a gran gran ... gran father ).
We may call these classes base0 base1 ... base9 where base9 derives from base8 etc.
Each of these classes define a method as: virtual void doit(){ ... }
Let's suppose that in the base class we use that method inside a method called "dowith_doit" non overridden in any derived class.
The semantics of c++ imply that depending on the base class of the instance we have at hand, we must apply to that instance the "doit" defined in the base class of the instance at hand.
Essentially we have two possible ways of doing it:
a) Assign to any such virtual method a number that must be different for each method defined in the chain of derived classes. In that case the number could be also a hash of the name of the method.
Each class defines a table with 2 columns were the first column holds the number of the method and the second column the address of the function. In that case each class will have a vtable with so many rows as the number of virtual methods defined inside the class.
The execution of the method happens by searching inside the class the method under consideration. That search may be done linearly ( slow ) of by bisections ( when there is an order based on the number of the method).
b) Assign to any such method a progressively increasing integer number (for each different method in the chain of classes), and for each class define a table with only one column. For virtual methods defined inside the class the function address will be in the raw defined by the number of the method. There will be many rows with null pointers because each class doesn't override always the methods of previous classes.
The implementation may choose in order to improve efficiency to fill null rows with the address hold in the ancestor class of the class under consideration.
Essentially no other simple ways exist in order work with virtual methods efficiently.
I suppose that only the second solution (b) is used in actual implementations, because the trade of between space overhead used for non existing methods compared to execution efficiency of case (b) is favorable for case b (taking into consideration too that methods are limited in number - may be 10 20 50 but not 5000 ).
Upon instantiation every class with at least one virtual function gets a hidden member usually called vTable (or virtual dispatch table, VDT).
class Base {
hidden: // not part of the language, just to illustrate.
static VDT baseVDT; // per class VDT for base
VDT *vTable; // per object instance
private:
...
public:
virtual int base1();
virtual int base2();
...
};
The vTable contains pointers to all functions in Base.
As a hidden part of Base's constructor vTable gets assigned to baseVDT.
VDT Base::baseVDT[] = {
Base::base1,
Base::base2
};
class Derived : public Base {
hidden:
static VDT derivedVDT; // per class VDT for derived
private:
...
public:
virtual int base2();
...
};
The vTable for Derived contains pointers to all functions defined in Base followed by functions defined in Derived . When objects of type Derived gets constructed, vTable gets set to derivedVDT.
VDT derived::derivedVDT[] = {
// functions first defined in Base
Base::base1,
Derived::base2, // override
// functions first defined in Derived are appended
Derived::derived3
}; // function 2 has an override in derived.
Now if we have
Base *bd = new Derived;
Derived *dd = new Derived;
Base *bb = new Base;
bd points to an object of type derived who's vTable points to Derived
So the function calls
x = bd->base2();
y = bb->base2();
actually is
// "base2" here is the index into vTable for base2.
x = bd->vTable["base2"](); // vTable points to derivedVDT
y = bb->vTable["base2"](); // vTable points to baseVDT
The index is the same in both due to the construction of the VDT. This also means the compiler knows the index at the moment of compilation.
This could also be implemented as
// call absolute address to virtual dispatch function which calls the right base2.
x = Base::base2Dispatch(bd->vTable["base2"]);
inline Base::base2Dispatch(void *call) {
return call(); // call through function pointer.
}
Which with O2 or O3 will be the same.
There are some special cases:
dd points to a derived or more derived object and base2 is declared final then
z = dd->base2();
actually is
z = Derived::base2(); // absolute call to final method.
If dd pointed to a Base object or anything else your in undefined behaviour land and the compiler can still do this.
The other case is if the compiler sees there are only a few derived classes from Base it could generate a Oracle interface for base2. [free after a MS or Intel compiler guy at some C++ conference in 2012 or 2013? showing that (~500%?) more code gives (2+ times?) speedup on average]
inline Base::base2Dispatch(void *call) {
if (call == Derived::base2) // most likely from compilers static analysis or profiling.
return Derived::base2(); // call absolute address
if (call == Base::base2)
return Base::base2(); // call absolute address
// Backup catch all solution in case of more derived classes
return call(); // call through function pointer.
}
Why on earth do you want to do this as a compiler??? more code is bad, unneeded branches diminish performance!
Because calling a function pointer is very slow on many architectures, optimistic example
Get the address from memory, 3+ cycles.
Delayed pipeline while waiting for ip value, 10 cycles, on some processors 19+ cycles.
If the most complex modern cpu's can predict the actual jump address [BTB] as well as it does branch prediction, this might be a loss. Else the ~8 extra instructions will easily save the 4*(3+10) instructions lost due to pipeline stalls (if the prediction failure rate is less than 10-20%).
If the branches in the two if's both predict taken (ie evaluate to false) the ~2 cycles lost is nicely covered by the memory latency to get the call address and we are no worse off.
If one of the if's are mispredicts the the BTB will most likely also be wrong. Then the cost of the mispredicts is around 8 cycles of which 3 are paid by the memory latency, and the correct not take or the 2nd if might save the day or we pay the full 10+ pipeline stall.
If only the 2 possibilities exists one of them will be taken and we save the pipeline stall from the function pointer call and we will max. get one mispredict resulting in no (significant) worse performance than calling directly.
If the memory delay is longer and the result is correctly predicted the effect is much larger.

when a base class pointer point to a base class object whether the compiler will use dynamic binding?

I know that when use a base class pointer which point to a derived class object to call a virtual function, the compiler will use dynamic binding to call the derived version.
But when use a base class pointer which point to a base class object to call a virtual function, does the compiler use dynamic binding or static binding to call the virtual
function?
For example:
class Base
{
public:
virtual void show()
{
cout << "base class";
}
}
int main()
{
Base *pb; //Base class pointer
Base b; //Base class object
pb = &b;
pb->show(); //Is static binding or dynamic binding?
}
Because My English is very bad, so I want to make my question as simple as possible, but I will describle my question in more detail in follow:
Actually the problem stems from that I am summarizing how to trigger dynamic binding.
At first I summary the trigger condition are:
the function must a virtual function.
must use pointer or reference to call the function.
The two trigger condition cause the problem that I asked:
"when a base class pointer point to a base class object whether the compiler will use dynamic binding?"
I have google for search answer, and I find a fragment (the demo is here):
struct A {
virtual void f() { cout << "Class A" << endl; }
};
struct B: A {
//Note that it not overrides A::f.
void f(int) { cout << "Class B" << endl; }
};
struct C: B {
void f() { cout << "Class C" << endl; }
};
int main() {
B b; C c;
A* pa1 = &b;
A* pa2 = &c;
// b.f();
pa1->f();
pa2->f();
}
The following is the output of the above example:
"Class A"
"Class C"
According to pa1->f() will output Class A, I summary third trigger condition:
3.function in base class must be overridden in the derived class.
Now according to the three trigger condition, when use a base class pointer which point to a base class object to call a virtual function, the compiler will use static binding to call the virtual function, because the virtual is not overridden.
But when use a derived class pointer which point to a derived class object to call a virtual function, it will use dynamic binding, because the virtual is overridden.
It made me very confused.
It can choose whichever, or neither, depending on how smart it is and how well it can detect. The rule is polymorphism must work. How this is achieved is an implementation detail.
If the same end-result can be achieved with both dynamic or static binding, as is the case here, both are valid options for the compiler.
In your case, the function doesn't have to be called at all - the generated code could be just as well identical to code generated by
int main()
{
cout << "base class";
}
I guess it depends on compiler optimization. Compiler might be clever enough to figure out that Base::show is always the one called or it might not. You can look at the disassembly to find out. You can force static-binding with b->Base::show()
Short answer: No. At least in theory not. Because in theory, the compiler does not know wether the pointer points to a Base, a Derived or to YetAnotherDerived object. Therefore it has to apply the same mechanism regardless of the dynamic type of the object.
But: In practise, compilers have optimizers, capable of identifying some use cases where the dynamic type is known. I your case it can detect the aliasing, meaning it knows that pb points to b and that it is a local variable and cannot be changed concurrently, so it knows that in fact you are calling b.show() and will abbreviate the output to reflect that fact and get rid of the virtual dispatch. Similar optimizations are possible e.g. in this code:
auto pb = make_unique<Base>();
pb->show();
But as any optimization it is up to the compiler if it applies them - the standard says virtual dispatch happens even if the pointer points to a Base object, and that's it.

How do I know which function will be called?

Today I found the following disturbingly ambiguous situation in our code base:
class Base {
public:
virtual void Irrelevant_Function(void) = 0;
protected:
C_Container * Get_Container(void);
};
class A : public Base, public Not_Important {
public:
inline C_Container * Get_Container(void);
};
class B : public Base, protected SomethingElse {
public:
C_Container * Get_Container(void);
};
Many things were calling the Get_Container method, but not always calling the correct one - note that none of these functions were virtual.
I need to rename the methods Get_Base_Container, Get_A_Container, etc to remove the ambiguity. What rules does C++ use to determine which version of a function it should call? I'd like to start from the "known state" of what should have been getting called, and then figure out the bugs from there.
For example, if I have a pointer to a Base and call Get_Container, I assume it would just call the Base version of the function. What if I have a pointer to an A? What about a pointer to a B? What about an A or B on the heap?
Thanks.
It depends how you're calling the function. If you're calling through an A *, an A & or an A, then you'll be calling A::Get_Container(). If you're calling through a Base *, a Base & (even if they point to/reference an A), then you'll be calling Base::Get_Container().
As long as there's no virtual inheritance going on, it's quite easy. If you're working directly with an object, it's the object's method that gets called; if you're working with a pointer or reference, it's the type of the pointer or reference that determines the method, and the type of the object pointed to doesn't matter.
A method is first looked up according to the object's static type. If it is non-virtual there, you're done: that's the method that's called. The dynamic type is what virtual methods, dynamic_cast, and typeid use, and is the "actual" type of the object. The static type is what the static type system works with.
A a; // Static type and dynamic type are identical.
Base &a_base = a; // Static type is Base; dynamic type is A.
a.Get_Contaienr(); // Calls A::Get_Container.
a_base.Get_Container(); // Calls Base::Get_Container.
B *pb = new B(); // Static type and dynamic type of *pb (the pointed-to
// object) are identical.
Base *pb_base = pb; // Static type is Base; dynamic type is B.
pb->Get_Container(); // Calls B::Get_Container.
pb_base->Get_Container(); // Calls Base::Get_Container.
I've assumed above that the protected Base::Get_Container method is accessible, otherwise those will be compile errors.
A couple of additional points to note here:
Name lookup occurs in a single scope; E.g. When calling the method on an object with static type 'B', the compiler considers the interface of 'B' to determine whether or not there is a valid match. If there is not, it only then looks at the interface of Base to find a match. This is why that from the compiler's view, there is no ambiguity and it can resolve the call. If your real code has overloading etc. this may be an issue.
Secondly, it is often forgotten that the 'protected' keyword applies at class and not object level. So for example:
class Base {
protected:
C_Container * Get_Container(void);
};
class B : public Base{
public:
C_Container * Get_Container(void)
{
B b;
// Call the 'protected' base class method on another object.
return b.Base::Get_Container();
}
};