Virtual function and Classes - c++

I need some answers to basic questions. I'm lost again. :(
q1 - Is this statement valid:
Whenever we define the function to be pure virtual function,
this means that function has no body.
q2 - And what is the concept of Dynamic Binding? I mean if the Compiler optimizes the code using VTABLEs and VPTRs then how is it Run-Time Polymorphism?
q3 - What are VTABLES AND VPTRs and how do their sizes change?
q4 - Please see this code:
class base
{
public:
virtual void display()
{
cout<<"Displaying from base";
}
};
class derived:public base
{
public:
void display(){cout<<"\nDisplaying from derived";}
};
int main()
{
base b,*bptr;
derived d;
bptr=&b;
bptr->display();
bptr=&d;
bptr->display();
}
Output:
Displaying from base
Displaying from derieved
Please can somebody answer why a pointer of base class can point the member function of a derived class and the vice-versa is not possible, why ?

False. It just means any derived classes must implement said function. You can still provide a definition for the function, and it can be called by Base::Function().*
Virtual tables are a way of implementing virtual functions. (The standard doesn't mandate this is the method, though.) When making a polymorphic call, the compiler will look up the function in the function table and call that one, enabling run-time binding. (The table is generated at compile time.)
See above. Their sizes change as there are more virtual functions. However, instances don't store a table but rather a pointer to the table, so class size only has a single size increase.
Sounds like you need a book.
*A classic example of this is here:
struct IBase
{
virtual ~IBase(void) = 0;
};
inline IBase::~IBase(void) {}
This wouldn't be an abstract class without a pure virtual function, but a destructor requires a definition (since it will be called when derived classes destruct.)

1) Not necessarily. There are times when you provide body for pure virtual functions
2) The function to be called is determined at run time.

False. It only means that derived classes must implement the method and that the method definition (if present) at that level will not be consider an override of the virtual method.
The vtable is implemented at compile time, but used at runtime. The compiler will redirect the call through the vtable, and that depends on the runtime type of the object (a pointer to base has static type base*, but might point to an object of type derived at runtime).
vptrs are pointers to an vtable, they do not change size. vtables are tables of pointers to code (might point to methods or to some adapter code) and have one entry for each virtual method declared in the class.
After the edit in the code:
The pointer refers to an object of type base during the first call, but it points to an object of type derived at the second call position. The dynamic dispatch mechanism (vtable) routes the call to the appropriate method.
A common implementation, which may help you understand is that in each class that declares virtual functions the compiler reserves space for a pointer to a virtual table, and it also generates the virtual table itself, where it adds pointers to the definition of each virtual method. The memory layout of the object only has that extra pointer.
When a derived class overrides any of the base class methods, the compiler generates a different vtable with pointers to the final overriders at that level. The memory layout of both the base and the derived class coincide in the base subobject part (usually the beginning), but the value of the vptr of a base object will point to the base vtable, while the value of the vptr in the derived object will point to the derived vtable.
When the compiler sees a call like bptr->display(), it checks the definition of the base class, and sees that it is the first virtual method, then it redirects the call as: bptr->hidden_vptr[0](). If the pointer is referring to a real base instance, that will be a pointer to base::display, while in the case of a derived instance it will point to derived::display.
Note that there is a lot of hand-waving in this answer. All this is implementation defined (the language does not specify the dispatch mechanism), and in most cases the dispatch mechanism is more complex. For example, when multiple inheritance takes place, the vtable will not point directly to the final overrider, but to an adapter block of code that resets the first implicit this parameter offset, as the base subobject of all but the first base are unaligned with the most derived object in memory --this is well beyond the scope of the question, just remember that this answer is a rough idea and that there is added complexity in real systems.

Is this statement valid
Not exactly: it might have a body. A more accurate definition is "Whenever we define a method to be pure virtual, this means that the method must be defined (overriden) in a concrete subclass."
And what is the concept of Dynamic Binding? I mean if the Compiler optimizes the code using VTABLEs and VPTRs then how is it Run-Time Polymorphism?
If you have an instance of a superclass (e.g. Shape) at run-time, you don't/needn't necessarily know which of its subclasses (e.g. Circle or Square) it is.
What are VTABLES AND VPTRs and how do their sizes change?
There's one vtable per class (for any class which has one or more virtual methods). The vtable contains pointers to the addresses of the class's virtual methods.
There's one vptr per object (for any object which has one or more virtual methods). The vptr points to the vtable for that object's class.
The size of the vtable increases with the number of virtual functions in the class. The size of the vptr is probably constant.
Please can somebody answer why a pointer of base class can point the member function of a derived class and the vice-versa is not possible, why ?
If you want to invoke the base class function then (because it's virtual, and the default behaviour for virtual is to call the most-derived version via the vptr/vtable) then you have to say so explicitly, e.g. like this:
bptr->base::display();

Related

Can derived classes have more than one pointer to a virtual table?

I am watching the BackToBasics talk: Virtual Dispatch and Its Alternatives
from CppCon2019. The presenter says and the slide shows (assuming I haven't misunderstood) that a derived class inherits a vtable pointer from the base class and additionally has its own vptr.
Of course, technically this isn't mandated by the standard but I am getting myself a bit confused and my experiments with sizeof() also seem to imply there should only need to be one pointer. Please can someone clarify if there are any situations where multiple vptrs are needed?
Thanks
P.S. Just to be clear, in this context we are considering the more common public inheritance and not virtual or multiple inheritance (the presenter explicitly mentions this in an earlier part of the talk).
The a vtable contains the address of each virtual function for the class at a known offset.
[Remark: In practice unlike a regular class, vtables have members at negative offset, much like a pointer in a the middle of a array. That is just a convention that doesn't change implementation freedom much. Anyway the only issue is that the placement of an information in a vtable is legislated by a convention (the ABI) and compilers by following the same one produce compatible code for polymorphic classes.]
What happens when you have additional functions in a derived class? (not just the functions "inherited" from the base class)
Once you accept the idea that a pointer to a structure both points to the whole object and to its first member, you have the idea that a pointer to derived class points to a base class that is appropriately located at offset zero. So you can have the exact same pointer value, as represented as a void*, that can be used alternatively for a derived object or a base under this convention for single inheritance.
Now you can apply that to any data structure and even to a vtable which is really not a table (array of elements of the same type, or values that can be interpreted in the same way) but a record (of objects of unrelated type or meaning); you can see that a vtable for such derived class can just be derived from the vtable of its unique base in the exact same way.
(Note that if you compile C++ to C, you might run into type aliasing rules when you do such things. Of course assembly has no such problem, nor naively compiled "high level assembler" C.)
So for single inheritance the base is integrated and optimized into the derived class:
for data members of the instance (of a class type)
and for the virtual functions members, that is the data members of the vtable (or members of the meta class if you imagine one).
Note that placing the base at offset zero allows you to place vtable base at zero offset, which in turn allows you to use the same vptr but does not imply it; conversely sharing the vptr with a base implies that the base vtable is at offset zero (vtable layout = meta class level) so the base must be at offset zero (data members layout = class level).
And multiple inheritance is actually single inheritance plus, as one class is always treated as privileged: it is placed at offset zero so the pointers are the same, so the vtable can be placed at offset zero (because the pointers are the same); others bases, not so.
As we see, all but one of the inherited polymorphic classes are placed at a non zero offset in multiple inheritance. Each one carries an additional "inherited" vptr in the derived class; that (hidden) pointer member must be correctly filled by any derived constructor.
These additional vptr are for base classes that occur at non zero offset, so a pointer to an inherited base must be adjusted (add a positive constant to convert to base pointer, remove it to convert back). That a compiler needs to produce code to perform an implicit conversion is a trivial remark (converting an integer to a floating point type is a much more involved task); but here the conversion of this is between a function call on a given base type and landing in the function that is an overrider in a base or derived class: the difference is that adjustment depends on function overriding which is only known for a class (an instance of a meta type). So the vptr needs to point to distinct vtable information: one that knows how to deal with these base to derived pointer conversions.
As instances of the "meta type", vtables have all the information to do all pointers adjustment automatically. (These depend on the specific class types involved, and on no other factor.)
So at implementation level, the two types of inheritances are:
zero offset inheritance; sharing the vptr; called a primary base class in some vtable and ABI descriptions;
arbitrary offset inheritance; having another vptr; called secondary base class.
This is for the basic stuff. Virtual inheritance is a lot more subtle at the implementation level, and even the concept of primary isn't so clear, as virtual bases can be "primary" of a derived class only in some more derived classes!
Suppose we have two classes, each with at least one virtual function, Possession and Vehicle. To invoke those virtual functions for an instance of a derived class of either of those, a pointer to a virtual table is needed. Since these two classes are independent, their virtual tables will be completely different.
Now imagine that OwnedVehicle derives from both Possession and Vehicle. To call a virtual function in Possession for an instance of OwnedVehicle requires a pointer to a virtual function table of the type required by Possession. Similarly, to call a virtual function in Vehicle for an instance of OwnedVehicle requires a pointer to a virtual function table of the type required by Vehicle.
Typical implementations handle this by building a virtual function table for OwnedVehicle that contains one part for OwnedVehicle virtual functions (if any), one for Vehicle virtual functions and one for Possession virtual functions. Then, when calling a virtual function from a pointer to an object of a different type, all that the compiler has to do is apply an applicable delta to the virtual function table pointer to point to the correct part of it.
While the multiple inheritance case is more complex, the same occurs with just single inheritance. The virtual function table for OwnedVehicle contains inside it a virtual function table for Vehicle and would do so even if Possession were not involved.

When virtual functions are invoked statically?

What is the performance difference between calling a virtual function from a derived class pointer directly vs from a base class pointer to the same derived class?
In the derived pointer case, will the call be statically bound, or dynamically bound? I think it'll be dynamically bound because there's no guarantee the derived pointer isn't actually pointing to a further derived class. Would the situation change if I have the derived class directly by value (not through pointer or reference)? So the 3 cases:
base pointer to derived
derived pointer to derived
derived by value
I'm concerned about performance because the code will be run on a microcontroller.
Demonstrating code
struct Base {
// virtual destructor left out for brevity
virtual void method() = 0;
};
struct Derived : public Base {
// implementation here
void method() {
}
}
// ... in source file
// call virtual method from base class pointer, guaranteed vtable lookup
Base* base = new Derived;
base->method();
// call virtual method from derived class pointer, any difference?
Derived* derived = new Derived;
derived->method();
// call virtual method from derived class value
Derived derivedValue;
derived.method();
In theory, the only C++ syntax that makes a difference is a member function call that uses qualified member name. In terms of your class definitions that would be
derived->Derived::method();
This call ignores the dynamic type of the object and goes directly to Derived::method(), i.e. it's bound statically. This is only possible for calling methods declared in the class itself or in one of its ancestor classes.
Everything else is a regular virtual function call, which is resolved in accordance with the dynamic type of the object used in the call, i.e. it is bound dynamically.
In practice, compilers will strive to optimize the code and replace dynamically-bound calls with statically-bound calls in contexts where the dynamic type of the object is known at compile time. For example
Derived derivedValue;
derivedValue.method();
will typically produce a statically-bound call in virtually every modern compiler, even though the language specification does not provide any special treatment for this situation.
Also, virtual method calls made directly from constructors and destructors are typically compiled into statically-bound calls.
Of course, a smart compiler might be able to bind the call statically in a much greater variety of contexts. For example, both
Base* base = new Derived;
base->method();
and
Derived* derived = new Derived;
derived->method();
can be seen by the compiler as trivial situations that easily allow for statically-bound calls.
Virtual functions must be compiled to work as if they were always called virtually. If your compiler compiles a virtual call as a static call, that's an optimization that must satisfy this as-if rule.
From this, it follows that the compiler must be able to prove the exact type of the object in question. And there are some valid ways in which it can do this:
If the compiler sees the creation of the object (the new expression or the automatic variable from which the address is taken) and can prove that that creation is actually the source of the current pointer value, that gives it the precise dynamic type it needs. All your examples fall into this category.
While a constructor runs, the type of the object is exactly the class containing the running constructor. So any virtual function call made in a constructor can be resolved statically.
Likewise, while a destructor runs, the type of the object is exactly the class containing the running destructor. Again, any virtual function call can be resolved statically.
Afaik, these are all the cases that allow the compiler to convert a dynamic dispatch into a static call.
All of these are optimizations, though, the compiler may decide to perform the runtime vtable lookup anyway. But good optimizing compilers should be able to detect all three cases.
There should be no difference between the first two cases, since the very idea of virtual functions is to call always the actual implementation. Leaving compiler optimisations aside (which in theory could optimise all virtual function calls away if you construct the object in the same compilation unit and there is no way the pointer can be altered in between), the second call must be implemented as a indirect (virtual) call as well, since there could be a third class inheriting from Derived and implementing that function as well. I would assume that the third call will not be virtual, since the compiler knows the actual type already at compile time. Actually you could make sure of this by not defining the function as virtual, if you know you will always do the call on the derived class directly.
For really lightweight code running on a small microcontroller I would recommend avoiding defining functions as virtual at all. Usually there is no runtime abstraction required. If you write a library and need some kind of abstraction, you can maybe work with templates instead (which give you some compile-time abstraction).
At least on PC CPUs I often find virtual calls one of the most expensive indirections you can have (probably because branch prediction is more difficult). Sometimes one can also transform the indirection to the data level, e.g. you keep one generic function which operates on different data which is indirected with pointers to the actual implementation. Of course this will work only in some very specific cases.
At run-time.
BUT: Performance as compared to what? It isn't valid to compare a virtual function call to a non-virtual function call. You need to compare it to a non-virtual function call plus an if, a switch, an indirection, or some other means of providing the same function. If the function doesn't embody a choice among implementations, i.e. doesn't need to be virtual, don't make it virtual.

Why does an abstract class have a vtable?

Regarding this post:
For implementations that use vtable, the answer is: Yes, usually. You
might think that vtable isn't required for abstract classes because
the derived class will have its own vtable, but it is needed during
construction: While the base class is being constructed, it sets the
vtable pointer to its own vtable. Later when the derived class
constructor is entered, it will use its own vtable instead.
I'm assuming the answer is correct, but I don't quite get it. Why is the vtable needed exactly for construction?
Because the standard says so.
[class.cdtor]/4
When a virtual function is called directly or indirectly from a
constructor or from a destructor, including during the construction or
destruction of the class's non-static data members, and the object to
which the call applies is the object (call it x) under construction or
destruction, the function called is the final overrider in the
constructor's or destructor's class and not one overriding it in a
more-derived class.
The rationale is that first the base class is constructed, then the derived one. If a virtual function is called inside the base class' constructor, it would be bad to call the derived class, since the derived class isn't initialized yet.
Remember that an abstract class may have non-pure virtual functions. Also, for debugging purposes, it is good to point pure virtual functions to a debugging trap (e.g. MSVC calls _purecall()).
If all virtual functions are pure, in MSVC you can omit the vtable with __declspec(novtable). If you use a lot of interface classes, this can lead to significant savings because you omit vfptr initialization. But if you accidentally call a pure virtual function, you'll get a hard to debug access violation.
vtables are implementation issues in C++, they are not part of the standard.
vtables are used for both dynamic dispatching of methods and for RTTI. While a nullptr vtable pointer would work for dynamic dispatching (as the vtable pointer is only used when you have an instance of that type) in a pure-abstract class, a dynamic_cast to a pure abstract class is legal, and it may require that the vtable itself exist.
Designers of the C++ implementation and ABI might have simply given the purely abstract class (a class with no implemented methods, just =0 ones) a vtable to make their implementation simpler. Every class has a vtable, and the vtable pointer gets set during construction of that class. Code can then rely on the fact that the vtable pointer exists and does not have to check for null every time. Code doesn't have to ask questions like "is this a purely abstract class".
For a non-pure abstract class (where some methods have implementations but some are pure virtual), during construction/destruction you can have defined (if unexpected) behavior that involves invoking exactly this class's version of a given method, and not the base class method or an inherited method. For this to work, you need to have a vtable set up. With a pure abstract class, there is no defined result of such a call, so the vtable is redundant, but for an abstract class that isn't totally abstract this does not hold.
When your class has a pure virtual function, that does not mean you cannot also have an implementation for it (!!). So that implies you can have an abstract class, which is also fully implemented. The constructor of your abstract class has to be able to call all functions - even the pure virtual ones, because of this point - that exist for it so far.
If you'd have substituted the client one, you'd get different behaviour for the base class constructor depending on the deriving class - not a great idea, so that's not allowed. You could put in place no vtable and statically resolve all function calls - that works, but it implies handling the constructor specially compared to all other functions and requires inlining all other functions to do this (since a function called from the constructor may also call a virtual etc.) - not very practical.
So it just implements a vtable for the constructor and destructor to use during construction and destruction. It allows you to use typeid and dynamic_cast in the c'tor and d'tor with the predictable result and get reliable behaviour out of the virtual functions you have. No alternative solution would do that.

How Vtable of Virtual functions work

I have a small doubt in Virtual Table, whenever compiler encounters the virtual functions in a class, it creates Vtable and places virtual functions address over there. It happens similarly for other class which inherits. Does it create a new pointer in each class which points to each Vtable? If not how does it access the Virtual function when the new instance of derived class is created and assigned to Base PTR?
Each time you create a class that contains virtual functions, or you
derive from a class that contains virtual functions, the compiler
creates a unique VTABLE for that class.
If you
don’t override a function that was declared virtual in the base class,
the compiler uses the address of the base-class version in the
derived class.
Then it places the VPTR into
the class. There is only one VPTR for each object when using simple
inheritance . The VPTR must be initialized to point to the
starting address of the appropriate VTABLE. (This happens in the
constructor.)
Once the VPTR is initialized to the proper VTABLE, the object in
effect “knows” what type it is. But this self-knowledge is worthless
unless it is used at the point a virtual function is called.
When you call a virtual function through a base class address (the
situation when the compiler doesn’t have all the information
necessary to perform early binding), something special happens.
Instead of performing a typical function call, which is simply an
assembly-language CALL to a particular address, the compiler
generates different code to perform the function call.
For each class with virtual functions, a vtable is created. Then, when an object of a class with a viable is created using a constructor, the constructor copies the appropriate vtable into the object. So each object has a pointer to its vtable ( or in the case of multiple inheritance, when necessary, a Orr to each of its vtables. ). The compiler knows where in the object the vtable is, so when it needs to call a virtual method, it outputs byte code to deterrence the vtable, lookup the appropriate method, and jump to its address.
In the simple case of single inheritance, a child class starts with a copy of the parent class's vtable and then gets an overridden entry for each virtual method in the child class that overrides a parent class's method. ( and it also gets a new entry for every virtual function in the child clad that does not override a parent class method )
Whenever the program compiles the virtual table for each class is created, which makes clear to the fact that vtables are created per class basis.
During run time, when the object is created the compiler assigns vptr to the object, which points to the virtual table for the particular class' object.
In short the vptr is created per object basis.

C++: Why does a struct\class need a virtual method in order to be polymorphic?

Following this question, I'm wondering why a struct\class in C++ has to have a virtual method in order to be polymorphic.
Forcing a virtual destructor makes sense, but if there's no destructor at all, why is it mandatory to have a virtual method?
Because the type of a polymorphic object in C++ is, basically, determined from the pointer to its vtable, which is the table of virtual functions. The vtable is, however, only created if there's at least one virtual method. Why? Because in C++, you never get what you didn't explicitly ask for. They call it "you don't have to pay for something you don't need". Don't need polymorphism? You just saved a vtable.
Forcing a virtual destructor makes sense
Exactly. To destruct a virtual class manually (via delete) through its base class you need a virtual destructor. (Now, as I’ve been reminded in the comments, this isn’t usually needed: rather than use manual memory management, one would rely on modern smart pointers which also work correctly with non-virtual destructors.)
So any class which acts as a polymorphic base class usually needs either a virtual destructor or virtual functions anyway.
And since having runtime polymorphism adds an overhead (the class needs to store an additional pointer to its virtual method table), the default is not to add it, unless necessary anyway: C++’ design philosophy is “you only pay for what you need”. Making every class have a virtual method table would run afoul of this principle.
Because it is defined as such in the standard.
From 10.3/1 [class.virtual]
Virtual functions support dynamic binding and object-oriented programming. A class that declares or inherits a virtual function is called a polymorphic class.
It makes sense that if you use inheritance, then you have at least one virtual method. If you don't have any virtual method, then you could use composition instead.
polymorphism is to allow your subclasses to override the default behaviour of base class functions, so unless you have virtual methods in your base class, you can't override methods in base.
I'm wondering why does a struct\class in C++ has to have a virtual method in order to be polymorphic?
Because that is what polymorphic class means.
In C++, runtime polymorphism is achieved through virtual functions. A base class declares some virtual functions which the many derived classes implement, and the clients use pointers (or references) of static type of base class, and can make them point to objects of derived classes (often different derived classes), and then later on, call the implementation of derived classes through the base pointers. That is how runtime polymorphism is achieved. And since the central role is played by the functions being virtual, which enables runtime polymorphism, that is why classes having virtual functions is called polymorphic class.
Without any virtual method, there is no need to maintain a virtual pointer (abbreviated as vptr) for every object of the class. The virtual pointer is a mechanism for resolving virtual method calls at runtime; depending on the object's class, it might point to different virtual method tables (abbreviated as vtable) that contain the actual adresses of virtual methods.
So by checking to what vtable does the vptr point to, compiler can determine object's class, for example in dynamic_cast. Object without the vptr cannot have its type determined this way and is not polymorphic.
A C++ design philosophy is that "you don't pay for what you don't use". You might already know that a virtual function incur some overhead as a class has to maintain a pointer to its implementation. In fact, an object contains a reference to a table of function pointers called the vtable.
Consider the following example:
class Base
{
public:
virtual f() { /* do something */ }
};
class Derived : public Base
{
public:
virtual f() { /* do something */ }
};
Base* a = new Derived;
a->f(); // calls Derived::f()
Note that the variable a points to a Derived object. As f() is declared virtual the vtable of a will contain a pointer to Derived::f() and that implementation is executed. If f() is not virtual, the vtable will be empty. So Base::f() is executed as the type of a is Base.
A destructor behaves just like other member functions. If the destructor is not virtual, only the destructor in the Base class will be called. This might lead to memory/resource leaks if the Derived class implements RAII. If a class is intended to be sub-classed, its destructor should be virtual.
In some languages like Java all methods are virtual. So even objects that are not intended to be polymorphic will consume memory for maintaining the function pointers. In other words, you are forced to pay for what you don't use.
Classes only need virtual methods in order to be dynamically polymorphic - for the reasons described by others. You can still have static polymorphism through templates, though.