I am self-taught, and therefore am not familiar with a lot of terminology. I cannot seem to find the answer to this by googling: What is a "virtual" vs a "direct" call to a virtual function?
This pertains to terminology, not technicality. I am asking for when a call is defined as being made "directly" vs "virtually".
It does not pertain to vtables, or anything else that has to do with the implementation of these concepts.
The answer to your question is different at different conceptual levels.
At conceptual language level the informal term "virtual call" usually refers to calls resolved in accordance with the dynamic type of the object used in the call. According to C++ language standard, this applies to all calls to virtual functions, except for calls that use qualified name of the function. When qualified name of the method is used in the call, the call is referred to as "direct call"
SomeObject obj;
SomeObject *pobj = &obj;
SomeObject &robj = obj;
obj.some_virtual_function(); // Virtual call
pobj->some_virtual_function(); // Virtual call
robj.some_virtual_function(); // Virtual call
obj.SomeObject::some_virtual_function(); // Direct call
pobj->SomeObject::some_virtual_function(); // Direct call
robj.SomeObject::some_virtual_function(); // Direct call
Note that you can often hear people say that calls to virtual functions made through immediate objects are "not virtual". However, the language specification does not support this point of view. According to the language, all non-qualified calls to virtual functions are the same: they are resolved in accordance with the dynamic type of the object. In that [conceptual] sense they are all virtual.
At implementation level the term "virtual call" usually refers to calls dispatched through some implementation-defined mechanism, that implements the standard-required functionality of virtual functions. Typically it is implemented through Virtual Method Table (VMT) associated with the object used in the call. However, smart compilers will only use VMT to perform calls to virtual functions when they really have to, i.e. when the dynamic type of the object is not known at compile time. In all other cases the compiler will strive to call the method directly, even if the call is formally "virtual" at the conceptual level.
For example, most of the time, calls to virtual functions made with an immediate object (as opposed to a pointer or a reference to object) will be implemented as direct calls (without involving VMT dispatch). The same applies to immediate calls to virtual functions made from object's constructor and destructor
SomeObject obj;
SomeObject *pobj = &obj;
SomeObject &robj = obj;
obj.some_virtual_function(); // Direct call
pobj->some_virtual_function(); // Virtual call in general case
robj.some_virtual_function(); // Virtual call in general case
obj.SomeObject::some_virtual_function(); // Direct call
pobj->SomeObject::some_virtual_function(); // Direct call
robj.SomeObject::some_virtual_function(); // Direct call
Of course, in this latter sense, nothing prevents the compiler from implementing any calls to virtual functions as direct calls (without involving VMT dispatch), if the compiler has sufficient information to determine the dynamic type of the object at compile time. In the above simplistic example any modern compiler should be able to implement all calls as direct calls.
Suppose you have this class:
class X {
public:
virtual void myfunc();
};
If you call the virtual function for a plain object of type X, the compiler will generate a direct call, i.e. refer directly to X::myfunct():
X a; // object of known type
a.myfunc(); // will call X::myfunc() directly
If you'd call the virtual function via a pointer dereference, or a reference, it is not clear which type the object pointed to will really have. It could be X but it could also be a type derived from X. Then the compiler will make a virtual call, i.e. use a table of pointers to the function address:
X *pa; // pointer to a polymorphic object
... // initialise the pointer to point to an X or a derived class from X
pa->myfunc(); // will call the myfunc() that is related to the real type of object pointed to
Here you have an online simulation of the code. You'll see that in the first case, the generated assembly calls the address of the function, whereas in the second case, the compiler loads something in a register and make an indirect call using this register (i.e. the called address is not "hard-wired" and will be determined dynamically at run time).
Related
It's true that calling virtual function in constructor and destructor is not a good practice, and should be avoided. It's because virtual functions are affected by subclasses, but in constructing or destructing phase subclasses are not yet constructed(in constructing) or already destructed(in destructing).
However what happens if a virtual final function is invoked in constructor or destructor? I assume that there should be no problem, since it's not logically wrong.
Calling virtual function in constructor and destructor is forbidden because accessing to subclass' variable, not initialized yet, can occur in overridden version of virtual function, which is declared in the subclass.
While virtual final function is not, it's final and there's no way to access to subclass' variables.
But this is my assumption, and there could be any more reasons that calling virtual function in constructor or destructor is not reasonable.
So, in conclusion,
Is calling virtual final function in constructing/destructing phase is allowed in C++ standard?
If so, is it widely implemented to most C++ compilers?
If it's not, is there any reason for that?
Is calling virtual final function in constructing/destructing phase is
allowed in C++ standard?
Calling a virtual function during construction/destruction is well defined and completely legal except in the case of pure virtual functions.
Calling virtual function in constructor and destructor is forbidden
I don't know (nor cares) who says it's "bad" or "forbidden" from a stylistic point of view, code maintenance point of view... The ability to maintain code depends first on knowing the relevant language and tools well; not knowing what virtual calls do during these phases (*) will lead to misunderstand on the part of the maintainers which is fixed by selecting more experienced maintainers and not dumbing down the programming style.
(*) which aren't technically part of the "lifetime" of the object, which isn't even a very useful concept as objects are usable and used in their constructor (before their lifetime has started) in any non trivial program (I think the standard should simply suppress this unneeded concept).
accessing to subclass' variable, not initialized yet, can occur in
overridden version of virtual function, which is declared in the
subclass.
It can't. During construction of a base class subobject B (say by constructor B::B()), the type of the object is being constructed is by definition B.
overridden version of virtual function, which is declared in the
subclass.
No, there is no existing subclass object at that point, so there is no overriding.
While virtual final function is not, it's final and there's no way to
access to subclass' variables.
It makes no difference.
The dynamic type of a polymorphic object is established by a constructor, after the constructors for base classes and before constructing members.
If so, is it widely implemented to most C++ compilers?
In practice all compilers implement setting the dynamic type of an object by changing the one or many vtable pointers to point to appropriate vtables for the type; that is done as part of construction.
It means that during construction, the vptr value changes as derived objects are constructed.
First, the rule is: "Do not directly or indirectly invoke a virtual function from a constructor or destructor that attempts to call into the object under construction or destruction." That's not opinion. That's the SEI CERT coding standard. Original document at:
https://resources.sei.cmu.edu/downloads/secure-coding/assets/sei-cert-cpp-coding-standard-2016-v01.pdf
and a link to the relevant rule OOP50-CPP at:
https://wiki.sei.cmu.edu/confluence/display/cplusplus/OOP50-CPP.+Do+not+invoke+virtual+functions+from+constructors+or+destructors.
In answer to the original question, there are several exceptions to this rule. One of those, OOP50-CPP-EX2, is if the function or class is marked as final. Then it cannot be overridden by a derived class. You can also explicitly qualify the function call.
And yes, final is widely implemented.
What is the performance difference between calling a virtual function from a derived class pointer directly vs from a base class pointer to the same derived class?
In the derived pointer case, will the call be statically bound, or dynamically bound? I think it'll be dynamically bound because there's no guarantee the derived pointer isn't actually pointing to a further derived class. Would the situation change if I have the derived class directly by value (not through pointer or reference)? So the 3 cases:
base pointer to derived
derived pointer to derived
derived by value
I'm concerned about performance because the code will be run on a microcontroller.
Demonstrating code
struct Base {
// virtual destructor left out for brevity
virtual void method() = 0;
};
struct Derived : public Base {
// implementation here
void method() {
}
}
// ... in source file
// call virtual method from base class pointer, guaranteed vtable lookup
Base* base = new Derived;
base->method();
// call virtual method from derived class pointer, any difference?
Derived* derived = new Derived;
derived->method();
// call virtual method from derived class value
Derived derivedValue;
derived.method();
In theory, the only C++ syntax that makes a difference is a member function call that uses qualified member name. In terms of your class definitions that would be
derived->Derived::method();
This call ignores the dynamic type of the object and goes directly to Derived::method(), i.e. it's bound statically. This is only possible for calling methods declared in the class itself or in one of its ancestor classes.
Everything else is a regular virtual function call, which is resolved in accordance with the dynamic type of the object used in the call, i.e. it is bound dynamically.
In practice, compilers will strive to optimize the code and replace dynamically-bound calls with statically-bound calls in contexts where the dynamic type of the object is known at compile time. For example
Derived derivedValue;
derivedValue.method();
will typically produce a statically-bound call in virtually every modern compiler, even though the language specification does not provide any special treatment for this situation.
Also, virtual method calls made directly from constructors and destructors are typically compiled into statically-bound calls.
Of course, a smart compiler might be able to bind the call statically in a much greater variety of contexts. For example, both
Base* base = new Derived;
base->method();
and
Derived* derived = new Derived;
derived->method();
can be seen by the compiler as trivial situations that easily allow for statically-bound calls.
Virtual functions must be compiled to work as if they were always called virtually. If your compiler compiles a virtual call as a static call, that's an optimization that must satisfy this as-if rule.
From this, it follows that the compiler must be able to prove the exact type of the object in question. And there are some valid ways in which it can do this:
If the compiler sees the creation of the object (the new expression or the automatic variable from which the address is taken) and can prove that that creation is actually the source of the current pointer value, that gives it the precise dynamic type it needs. All your examples fall into this category.
While a constructor runs, the type of the object is exactly the class containing the running constructor. So any virtual function call made in a constructor can be resolved statically.
Likewise, while a destructor runs, the type of the object is exactly the class containing the running destructor. Again, any virtual function call can be resolved statically.
Afaik, these are all the cases that allow the compiler to convert a dynamic dispatch into a static call.
All of these are optimizations, though, the compiler may decide to perform the runtime vtable lookup anyway. But good optimizing compilers should be able to detect all three cases.
There should be no difference between the first two cases, since the very idea of virtual functions is to call always the actual implementation. Leaving compiler optimisations aside (which in theory could optimise all virtual function calls away if you construct the object in the same compilation unit and there is no way the pointer can be altered in between), the second call must be implemented as a indirect (virtual) call as well, since there could be a third class inheriting from Derived and implementing that function as well. I would assume that the third call will not be virtual, since the compiler knows the actual type already at compile time. Actually you could make sure of this by not defining the function as virtual, if you know you will always do the call on the derived class directly.
For really lightweight code running on a small microcontroller I would recommend avoiding defining functions as virtual at all. Usually there is no runtime abstraction required. If you write a library and need some kind of abstraction, you can maybe work with templates instead (which give you some compile-time abstraction).
At least on PC CPUs I often find virtual calls one of the most expensive indirections you can have (probably because branch prediction is more difficult). Sometimes one can also transform the indirection to the data level, e.g. you keep one generic function which operates on different data which is indirected with pointers to the actual implementation. Of course this will work only in some very specific cases.
At run-time.
BUT: Performance as compared to what? It isn't valid to compare a virtual function call to a non-virtual function call. You need to compare it to a non-virtual function call plus an if, a switch, an indirection, or some other means of providing the same function. If the function doesn't embody a choice among implementations, i.e. doesn't need to be virtual, don't make it virtual.
I teach a C++ programming class and I've seen enough classes of errors that I have a good feeling for how to diagnose common C++ bugs. However, there's one major type of error for which my intuition isn't particularly good: what programming errors cause calls to pure virtual functions? The most common error I've seen that causes this is calling a virtual function from a base class constructor or destructor. Are there any others I should be aware of when helping debug student code?
"The most common error I've seen that causes this is calling a virtual function from a base class constructor or destructor."
When an object is constructed, the pointer to the virtual dispatch table is initially aimed at the highest superclass, and it's only updated as the intermediate classes complete construction. So, you can accidentally call the pure virtual implementation up until the point that a subclass - with its own overriding function implementation - completes construction. That might be the most-derived subclass, or anywhere in between.
It might happen if you follow a pointer to a partially constructed object (e.g. in a race condition due to async or threaded operations).
If a compiler has reason to think it knows the real type to which a pointer-to-base-class points, it may reasonably bypass the virtual dispatch. You might confuse it by doing something with undefined behaviour like a reinterpret cast.
During destruction, the virtual dispatch table should be reverted as derived classes are destroyed, so the pure virtual implementation may again be invoked.
After destruction, continued use of the object via "dangling" pointers or references may invoke the pure virtual function, but there's no defined behaviour in such situations.
Here are a few cases in which a pure virtual call can happen.
Using a dangling pointer - the pointer isn't of a valid object so the virtual table it points to is just random memory which may contain NULL
Bad cast using a static_cast to the wrong type (or C-style cast) can also cause the object you point to to not have the correct methods in its virtual table (in this case at least it really is a virtual table unlike the previous option).
DLL has been unloaded - If the object you're holding on to was created in a shared object file (DLL, so, sl) which has been unloaded again the memory can be zeroed out now
This can happen for example when the reference or pointer to an object is pointing to a NULL location, and you use the object reference or pointer to call a virtual function in the class. For example:
std::vector <DerivedClass> objContainer;
if (!objContainer.empty())
const BaseClass& objRef = objContainer.front();
// Do some processing using objRef and then you erase the first
// element of objContainer
objContainer.erase(objContainer.begin());
const std::string& name = objRef.name();
// -> (name() is a pure virtual function in base class,
// which has been implemented in DerivedClass).
At this point object stored in objContainer[0] does not exist. When the virtual table is indexed, no valid memory location is found. Hence, a run time error is issued saying "pure virtual function called".
How does a C++ object know where it's member function definitions are present? I am quite confused as the Object itself does not contain the function pointers.
sizeof on the Object proves this.
So how is the object to function mapping done by the Runtime environment? where is a class's member function-pointer table maintained?
If you're calling non-virtual functions, there's no need for a function-pointer table; the compiler can resolve the function addresses at compile-time. So:
A a;
a.func();
translates to something along the lines of:
A a;
A_func(&a);
Calling a virtual function through a base-class pointer typically uses a vtable. So:
A *p_a = new B();
p_a->func();
translates to something along the lines of:
A *p_a = new B();
p_a->p_vtbl->func(p_a);
where p_vtbl is a compiler-implemented pointer to the vtable specific to the actual class of *p_a.
There are generally two ways that an object and its member functions are associated:
For a non-virtual function, the compiler determines the appropriate function at compile time. Non-static member functions are usually passed a hidden parameter that contains the this pointer, which takes care of the association of the object and the class member function.
For virtual functions, most compilers tend to use a lookup table that is usually referenced via the object's this pointer or a similar mechanism. This table, normally called the vtable, contains the function pointer for the virtual functions only.
As C++ is not a dynamic language, the compiler can do most of the object/function/symbol resolution at compile time with the exception of some virtual functions. In some cases, it's even possible for the compiler to determine exactly which instance of a virtual function gets called and skip the resolution via the vtable.
Member functions are not part of the object - they are defined statically, in one place, just like any other function. There is no magic look-up needed.
Virtual functions are different, but I don't think your question is about that...
For non-virtual functions there is one (global, per-class) function table which all instances use. Since it's the same for all of them - deterministic at compile-time - you would not want it duplicated in each instance.
For virtual functions, resolution is done at runtime and the object will contain a function table for them. Try that and look at your object again.
i tried to look up whether virtual function determine during compilation or while running.
while looking i found something as dynamic linking/late binding
but i didn't understand if it means that the function itself determine during compilation before the executable or during the executable.
can someone please explain?
For virtual functions resolution is done at runtime. When you have an instance of an object the resolution of which method to call is known only when the program is running because only at runtime you know the exact type of this instance. For non-virtual functions this resolution can be done at compile time because it is known that only this method can be called and there cannot be child classes overriding it. Also that's why virtual method calls are a bit slower (absolutely negligibly but slower than non-virtual method calls). This article explains the concept in more details.
Usually virtual functions are resolved during runtime. The reasons are obvious: you usually don't know what actual object will be called at the call site.
Base *x; Derived *y;
Call1(y);
void Call1(Base *ptr)
{
ptr->virtual_member();
// will it be Base::virtual_member or Derived::virtual_member ?
//runtime resolution needed
}
Such situation, when it's not clear what function will be called at the certain place of code, and only in runtime it's actually determined, is called late binding.
However, in certain cases, you may know the function you're going to call. For example, if you don't call by pointer:
Base x; Derived y;
Call2(y);
void Call2(Base ptr)
{
ptr.virtual_member();
// It will always be Base::virtual_member even if Derived is passed!
//No dynamic binding necessary
}
The name lookup, overload resolution and access check for a virtual function call happens at compile time in the 'static' type of the object expression used to invoke the virtual function call (i.e if the object expression is of type pointer or a reference to a polymorphic base class).
The actual function called at run time however depends on the dynamic type of the object expression pointed to by the base class pointer or reference.