I teach a C++ programming class and I've seen enough classes of errors that I have a good feeling for how to diagnose common C++ bugs. However, there's one major type of error for which my intuition isn't particularly good: what programming errors cause calls to pure virtual functions? The most common error I've seen that causes this is calling a virtual function from a base class constructor or destructor. Are there any others I should be aware of when helping debug student code?
"The most common error I've seen that causes this is calling a virtual function from a base class constructor or destructor."
When an object is constructed, the pointer to the virtual dispatch table is initially aimed at the highest superclass, and it's only updated as the intermediate classes complete construction. So, you can accidentally call the pure virtual implementation up until the point that a subclass - with its own overriding function implementation - completes construction. That might be the most-derived subclass, or anywhere in between.
It might happen if you follow a pointer to a partially constructed object (e.g. in a race condition due to async or threaded operations).
If a compiler has reason to think it knows the real type to which a pointer-to-base-class points, it may reasonably bypass the virtual dispatch. You might confuse it by doing something with undefined behaviour like a reinterpret cast.
During destruction, the virtual dispatch table should be reverted as derived classes are destroyed, so the pure virtual implementation may again be invoked.
After destruction, continued use of the object via "dangling" pointers or references may invoke the pure virtual function, but there's no defined behaviour in such situations.
Here are a few cases in which a pure virtual call can happen.
Using a dangling pointer - the pointer isn't of a valid object so the virtual table it points to is just random memory which may contain NULL
Bad cast using a static_cast to the wrong type (or C-style cast) can also cause the object you point to to not have the correct methods in its virtual table (in this case at least it really is a virtual table unlike the previous option).
DLL has been unloaded - If the object you're holding on to was created in a shared object file (DLL, so, sl) which has been unloaded again the memory can be zeroed out now
This can happen for example when the reference or pointer to an object is pointing to a NULL location, and you use the object reference or pointer to call a virtual function in the class. For example:
std::vector <DerivedClass> objContainer;
if (!objContainer.empty())
const BaseClass& objRef = objContainer.front();
// Do some processing using objRef and then you erase the first
// element of objContainer
objContainer.erase(objContainer.begin());
const std::string& name = objRef.name();
// -> (name() is a pure virtual function in base class,
// which has been implemented in DerivedClass).
At this point object stored in objContainer[0] does not exist. When the virtual table is indexed, no valid memory location is found. Hence, a run time error is issued saying "pure virtual function called".
Related
It's true that calling virtual function in constructor and destructor is not a good practice, and should be avoided. It's because virtual functions are affected by subclasses, but in constructing or destructing phase subclasses are not yet constructed(in constructing) or already destructed(in destructing).
However what happens if a virtual final function is invoked in constructor or destructor? I assume that there should be no problem, since it's not logically wrong.
Calling virtual function in constructor and destructor is forbidden because accessing to subclass' variable, not initialized yet, can occur in overridden version of virtual function, which is declared in the subclass.
While virtual final function is not, it's final and there's no way to access to subclass' variables.
But this is my assumption, and there could be any more reasons that calling virtual function in constructor or destructor is not reasonable.
So, in conclusion,
Is calling virtual final function in constructing/destructing phase is allowed in C++ standard?
If so, is it widely implemented to most C++ compilers?
If it's not, is there any reason for that?
Is calling virtual final function in constructing/destructing phase is
allowed in C++ standard?
Calling a virtual function during construction/destruction is well defined and completely legal except in the case of pure virtual functions.
Calling virtual function in constructor and destructor is forbidden
I don't know (nor cares) who says it's "bad" or "forbidden" from a stylistic point of view, code maintenance point of view... The ability to maintain code depends first on knowing the relevant language and tools well; not knowing what virtual calls do during these phases (*) will lead to misunderstand on the part of the maintainers which is fixed by selecting more experienced maintainers and not dumbing down the programming style.
(*) which aren't technically part of the "lifetime" of the object, which isn't even a very useful concept as objects are usable and used in their constructor (before their lifetime has started) in any non trivial program (I think the standard should simply suppress this unneeded concept).
accessing to subclass' variable, not initialized yet, can occur in
overridden version of virtual function, which is declared in the
subclass.
It can't. During construction of a base class subobject B (say by constructor B::B()), the type of the object is being constructed is by definition B.
overridden version of virtual function, which is declared in the
subclass.
No, there is no existing subclass object at that point, so there is no overriding.
While virtual final function is not, it's final and there's no way to
access to subclass' variables.
It makes no difference.
The dynamic type of a polymorphic object is established by a constructor, after the constructors for base classes and before constructing members.
If so, is it widely implemented to most C++ compilers?
In practice all compilers implement setting the dynamic type of an object by changing the one or many vtable pointers to point to appropriate vtables for the type; that is done as part of construction.
It means that during construction, the vptr value changes as derived objects are constructed.
First, the rule is: "Do not directly or indirectly invoke a virtual function from a constructor or destructor that attempts to call into the object under construction or destruction." That's not opinion. That's the SEI CERT coding standard. Original document at:
https://resources.sei.cmu.edu/downloads/secure-coding/assets/sei-cert-cpp-coding-standard-2016-v01.pdf
and a link to the relevant rule OOP50-CPP at:
https://wiki.sei.cmu.edu/confluence/display/cplusplus/OOP50-CPP.+Do+not+invoke+virtual+functions+from+constructors+or+destructors.
In answer to the original question, there are several exceptions to this rule. One of those, OOP50-CPP-EX2, is if the function or class is marked as final. Then it cannot be overridden by a derived class. You can also explicitly qualify the function call.
And yes, final is widely implemented.
What is the performance difference between calling a virtual function from a derived class pointer directly vs from a base class pointer to the same derived class?
In the derived pointer case, will the call be statically bound, or dynamically bound? I think it'll be dynamically bound because there's no guarantee the derived pointer isn't actually pointing to a further derived class. Would the situation change if I have the derived class directly by value (not through pointer or reference)? So the 3 cases:
base pointer to derived
derived pointer to derived
derived by value
I'm concerned about performance because the code will be run on a microcontroller.
Demonstrating code
struct Base {
// virtual destructor left out for brevity
virtual void method() = 0;
};
struct Derived : public Base {
// implementation here
void method() {
}
}
// ... in source file
// call virtual method from base class pointer, guaranteed vtable lookup
Base* base = new Derived;
base->method();
// call virtual method from derived class pointer, any difference?
Derived* derived = new Derived;
derived->method();
// call virtual method from derived class value
Derived derivedValue;
derived.method();
In theory, the only C++ syntax that makes a difference is a member function call that uses qualified member name. In terms of your class definitions that would be
derived->Derived::method();
This call ignores the dynamic type of the object and goes directly to Derived::method(), i.e. it's bound statically. This is only possible for calling methods declared in the class itself or in one of its ancestor classes.
Everything else is a regular virtual function call, which is resolved in accordance with the dynamic type of the object used in the call, i.e. it is bound dynamically.
In practice, compilers will strive to optimize the code and replace dynamically-bound calls with statically-bound calls in contexts where the dynamic type of the object is known at compile time. For example
Derived derivedValue;
derivedValue.method();
will typically produce a statically-bound call in virtually every modern compiler, even though the language specification does not provide any special treatment for this situation.
Also, virtual method calls made directly from constructors and destructors are typically compiled into statically-bound calls.
Of course, a smart compiler might be able to bind the call statically in a much greater variety of contexts. For example, both
Base* base = new Derived;
base->method();
and
Derived* derived = new Derived;
derived->method();
can be seen by the compiler as trivial situations that easily allow for statically-bound calls.
Virtual functions must be compiled to work as if they were always called virtually. If your compiler compiles a virtual call as a static call, that's an optimization that must satisfy this as-if rule.
From this, it follows that the compiler must be able to prove the exact type of the object in question. And there are some valid ways in which it can do this:
If the compiler sees the creation of the object (the new expression or the automatic variable from which the address is taken) and can prove that that creation is actually the source of the current pointer value, that gives it the precise dynamic type it needs. All your examples fall into this category.
While a constructor runs, the type of the object is exactly the class containing the running constructor. So any virtual function call made in a constructor can be resolved statically.
Likewise, while a destructor runs, the type of the object is exactly the class containing the running destructor. Again, any virtual function call can be resolved statically.
Afaik, these are all the cases that allow the compiler to convert a dynamic dispatch into a static call.
All of these are optimizations, though, the compiler may decide to perform the runtime vtable lookup anyway. But good optimizing compilers should be able to detect all three cases.
There should be no difference between the first two cases, since the very idea of virtual functions is to call always the actual implementation. Leaving compiler optimisations aside (which in theory could optimise all virtual function calls away if you construct the object in the same compilation unit and there is no way the pointer can be altered in between), the second call must be implemented as a indirect (virtual) call as well, since there could be a third class inheriting from Derived and implementing that function as well. I would assume that the third call will not be virtual, since the compiler knows the actual type already at compile time. Actually you could make sure of this by not defining the function as virtual, if you know you will always do the call on the derived class directly.
For really lightweight code running on a small microcontroller I would recommend avoiding defining functions as virtual at all. Usually there is no runtime abstraction required. If you write a library and need some kind of abstraction, you can maybe work with templates instead (which give you some compile-time abstraction).
At least on PC CPUs I often find virtual calls one of the most expensive indirections you can have (probably because branch prediction is more difficult). Sometimes one can also transform the indirection to the data level, e.g. you keep one generic function which operates on different data which is indirected with pointers to the actual implementation. Of course this will work only in some very specific cases.
At run-time.
BUT: Performance as compared to what? It isn't valid to compare a virtual function call to a non-virtual function call. You need to compare it to a non-virtual function call plus an if, a switch, an indirection, or some other means of providing the same function. If the function doesn't embody a choice among implementations, i.e. doesn't need to be virtual, don't make it virtual.
I would like to know when is a vtable created?
Whether its in the startup code before main() or is it at some other point of time??
A vtable isn't a C++ concept so if they are used and when they are created if they are used will depend on the implementation.
Typically, vtables are structures created at compile time (because they can be determined at compile time). When objects of a particular type are created at runtime they will have a vptr which will be initialized to point at a static vtable at construction time.
The vtable is created at compile time. When a new object is created during run time, the hidden vtable pointer is set to point to the vtable.
Keep in mind, though, that you can't make reliable use if the virtual functions until the object is fully constructed. (No calling virtual functions in the constructor.)
EDIT
I thought I'd address the questions in the comments.
As has been pointed out, the exact details of how the vtable is created and used is left up to the implementation. The c++ specification only provides specific behaviors that must be guaranteed, so there's plenty of wiggle room for the implementation. It doesn't have to use vtables at all (though most do). Generally, you don't need to know those details. You just need to know that when you call a virtual function, it do what you expect regardless of how it does it.
That said, I'll clarify a couple points about a typical implementation. A class with virtual functions has a hidden pointer (we'll call vptr) that points to the vtable for that class. Assume we have an employee class:
class Employee {
public:
virtual work();
};
This class will have a vptr in it's structure, so it actually may look like this:
class Employee {
public:
vtble *vptr; // hidden pointer
virtual work();
};
When we derive from this class, it will also have a vptr, and it must be in the same spot (in this case, at the beginning). That way when a function is called, regardless of the type of derived class, it always uses the vptr at the beginning to find the right vtable.
Does a pure-virtual object have a pointer to the vtbl?
(that probably points to NULL?)
thanks, i'm a little bit confused with all the virtual mechanism.
Don't worry about it. Virtual tables are an implementation detail, and aren't even guaranteed to exist. The more you worry about how it might be done, the less you learn about the actual language.
That said, yes. A concrete class will then set that pointer to point to the correct virtual table.
There isn't technically such a thing as a 'pure-virtual object'. I assume you mean an object with pure-virtual methods? But you can't actually create such an object because it would be abstract and the compiler would complain.
Having said that, while the object is being constructed it is briefly an instance of the abstract class before becoming an instance of the derived class. It will in such a case have a virtual table set the functions it defines. It will probably have NULL for the pure virtual methods. If you try calling that the program will crash.
You can try this out by calling virtual methods in the constructor. You'll find they invoke the base class version if you call the methods in the base class. If you call a pure virtual method it'll crash. (In some cases the compiler will figure out what you are doing and complain instead).
The take home is:
Don't call virtual functions in your constructor, its just likely to be confusing. In fact, in most cases it is best if your constructor just sets its internal status up and does not do anything too complicated.
If we write virtual function it adds a vtable in object of that class. Is it true for virtual destructor too ? Is vtable used to implement virtualness of destructor
Yes. Some information is needed to allow the right destructor to be called when the object is deleted via a base class pointer. Whether that information is a small integer index or a pointer doesn't matter (although dynamic linkage probably implies that it's a pointer). Naturally, that information needs to be adjacent to (inside) the pointed-to object.
Adding a virtual method of any kind, including a destructor, to a class that had none before, will increase sizeof(class).
I don't believe that the C++ standard requires any particular mechanism for producing the correct behavior, but yes, that's a typical implementation. A class with at least 1 virtual function has a table of (virtual) function pointers, the destructor being one of them, if it's marked virtual.
Yes it is. Sorry I don't have a definitive reference to back up my assertion. But how else would you get different behavior when using just a pointer to the object?
Yes. Virtual destructor is like any other virtual method. Vtable entry will get added.
It is treated like any other normal function and will be added to the vtable.
Take a look at http://geneura.ugr.es/~jmerelo/c++-faq/virtual-functions.html#faq-20.5