i tried to look up whether virtual function determine during compilation or while running.
while looking i found something as dynamic linking/late binding
but i didn't understand if it means that the function itself determine during compilation before the executable or during the executable.
can someone please explain?
For virtual functions resolution is done at runtime. When you have an instance of an object the resolution of which method to call is known only when the program is running because only at runtime you know the exact type of this instance. For non-virtual functions this resolution can be done at compile time because it is known that only this method can be called and there cannot be child classes overriding it. Also that's why virtual method calls are a bit slower (absolutely negligibly but slower than non-virtual method calls). This article explains the concept in more details.
Usually virtual functions are resolved during runtime. The reasons are obvious: you usually don't know what actual object will be called at the call site.
Base *x; Derived *y;
Call1(y);
void Call1(Base *ptr)
{
ptr->virtual_member();
// will it be Base::virtual_member or Derived::virtual_member ?
//runtime resolution needed
}
Such situation, when it's not clear what function will be called at the certain place of code, and only in runtime it's actually determined, is called late binding.
However, in certain cases, you may know the function you're going to call. For example, if you don't call by pointer:
Base x; Derived y;
Call2(y);
void Call2(Base ptr)
{
ptr.virtual_member();
// It will always be Base::virtual_member even if Derived is passed!
//No dynamic binding necessary
}
The name lookup, overload resolution and access check for a virtual function call happens at compile time in the 'static' type of the object expression used to invoke the virtual function call (i.e if the object expression is of type pointer or a reference to a polymorphic base class).
The actual function called at run time however depends on the dynamic type of the object expression pointed to by the base class pointer or reference.
Related
What is the performance difference between calling a virtual function from a derived class pointer directly vs from a base class pointer to the same derived class?
In the derived pointer case, will the call be statically bound, or dynamically bound? I think it'll be dynamically bound because there's no guarantee the derived pointer isn't actually pointing to a further derived class. Would the situation change if I have the derived class directly by value (not through pointer or reference)? So the 3 cases:
base pointer to derived
derived pointer to derived
derived by value
I'm concerned about performance because the code will be run on a microcontroller.
Demonstrating code
struct Base {
// virtual destructor left out for brevity
virtual void method() = 0;
};
struct Derived : public Base {
// implementation here
void method() {
}
}
// ... in source file
// call virtual method from base class pointer, guaranteed vtable lookup
Base* base = new Derived;
base->method();
// call virtual method from derived class pointer, any difference?
Derived* derived = new Derived;
derived->method();
// call virtual method from derived class value
Derived derivedValue;
derived.method();
In theory, the only C++ syntax that makes a difference is a member function call that uses qualified member name. In terms of your class definitions that would be
derived->Derived::method();
This call ignores the dynamic type of the object and goes directly to Derived::method(), i.e. it's bound statically. This is only possible for calling methods declared in the class itself or in one of its ancestor classes.
Everything else is a regular virtual function call, which is resolved in accordance with the dynamic type of the object used in the call, i.e. it is bound dynamically.
In practice, compilers will strive to optimize the code and replace dynamically-bound calls with statically-bound calls in contexts where the dynamic type of the object is known at compile time. For example
Derived derivedValue;
derivedValue.method();
will typically produce a statically-bound call in virtually every modern compiler, even though the language specification does not provide any special treatment for this situation.
Also, virtual method calls made directly from constructors and destructors are typically compiled into statically-bound calls.
Of course, a smart compiler might be able to bind the call statically in a much greater variety of contexts. For example, both
Base* base = new Derived;
base->method();
and
Derived* derived = new Derived;
derived->method();
can be seen by the compiler as trivial situations that easily allow for statically-bound calls.
Virtual functions must be compiled to work as if they were always called virtually. If your compiler compiles a virtual call as a static call, that's an optimization that must satisfy this as-if rule.
From this, it follows that the compiler must be able to prove the exact type of the object in question. And there are some valid ways in which it can do this:
If the compiler sees the creation of the object (the new expression or the automatic variable from which the address is taken) and can prove that that creation is actually the source of the current pointer value, that gives it the precise dynamic type it needs. All your examples fall into this category.
While a constructor runs, the type of the object is exactly the class containing the running constructor. So any virtual function call made in a constructor can be resolved statically.
Likewise, while a destructor runs, the type of the object is exactly the class containing the running destructor. Again, any virtual function call can be resolved statically.
Afaik, these are all the cases that allow the compiler to convert a dynamic dispatch into a static call.
All of these are optimizations, though, the compiler may decide to perform the runtime vtable lookup anyway. But good optimizing compilers should be able to detect all three cases.
There should be no difference between the first two cases, since the very idea of virtual functions is to call always the actual implementation. Leaving compiler optimisations aside (which in theory could optimise all virtual function calls away if you construct the object in the same compilation unit and there is no way the pointer can be altered in between), the second call must be implemented as a indirect (virtual) call as well, since there could be a third class inheriting from Derived and implementing that function as well. I would assume that the third call will not be virtual, since the compiler knows the actual type already at compile time. Actually you could make sure of this by not defining the function as virtual, if you know you will always do the call on the derived class directly.
For really lightweight code running on a small microcontroller I would recommend avoiding defining functions as virtual at all. Usually there is no runtime abstraction required. If you write a library and need some kind of abstraction, you can maybe work with templates instead (which give you some compile-time abstraction).
At least on PC CPUs I often find virtual calls one of the most expensive indirections you can have (probably because branch prediction is more difficult). Sometimes one can also transform the indirection to the data level, e.g. you keep one generic function which operates on different data which is indirected with pointers to the actual implementation. Of course this will work only in some very specific cases.
At run-time.
BUT: Performance as compared to what? It isn't valid to compare a virtual function call to a non-virtual function call. You need to compare it to a non-virtual function call plus an if, a switch, an indirection, or some other means of providing the same function. If the function doesn't embody a choice among implementations, i.e. doesn't need to be virtual, don't make it virtual.
I am self-taught, and therefore am not familiar with a lot of terminology. I cannot seem to find the answer to this by googling: What is a "virtual" vs a "direct" call to a virtual function?
This pertains to terminology, not technicality. I am asking for when a call is defined as being made "directly" vs "virtually".
It does not pertain to vtables, or anything else that has to do with the implementation of these concepts.
The answer to your question is different at different conceptual levels.
At conceptual language level the informal term "virtual call" usually refers to calls resolved in accordance with the dynamic type of the object used in the call. According to C++ language standard, this applies to all calls to virtual functions, except for calls that use qualified name of the function. When qualified name of the method is used in the call, the call is referred to as "direct call"
SomeObject obj;
SomeObject *pobj = &obj;
SomeObject &robj = obj;
obj.some_virtual_function(); // Virtual call
pobj->some_virtual_function(); // Virtual call
robj.some_virtual_function(); // Virtual call
obj.SomeObject::some_virtual_function(); // Direct call
pobj->SomeObject::some_virtual_function(); // Direct call
robj.SomeObject::some_virtual_function(); // Direct call
Note that you can often hear people say that calls to virtual functions made through immediate objects are "not virtual". However, the language specification does not support this point of view. According to the language, all non-qualified calls to virtual functions are the same: they are resolved in accordance with the dynamic type of the object. In that [conceptual] sense they are all virtual.
At implementation level the term "virtual call" usually refers to calls dispatched through some implementation-defined mechanism, that implements the standard-required functionality of virtual functions. Typically it is implemented through Virtual Method Table (VMT) associated with the object used in the call. However, smart compilers will only use VMT to perform calls to virtual functions when they really have to, i.e. when the dynamic type of the object is not known at compile time. In all other cases the compiler will strive to call the method directly, even if the call is formally "virtual" at the conceptual level.
For example, most of the time, calls to virtual functions made with an immediate object (as opposed to a pointer or a reference to object) will be implemented as direct calls (without involving VMT dispatch). The same applies to immediate calls to virtual functions made from object's constructor and destructor
SomeObject obj;
SomeObject *pobj = &obj;
SomeObject &robj = obj;
obj.some_virtual_function(); // Direct call
pobj->some_virtual_function(); // Virtual call in general case
robj.some_virtual_function(); // Virtual call in general case
obj.SomeObject::some_virtual_function(); // Direct call
pobj->SomeObject::some_virtual_function(); // Direct call
robj.SomeObject::some_virtual_function(); // Direct call
Of course, in this latter sense, nothing prevents the compiler from implementing any calls to virtual functions as direct calls (without involving VMT dispatch), if the compiler has sufficient information to determine the dynamic type of the object at compile time. In the above simplistic example any modern compiler should be able to implement all calls as direct calls.
Suppose you have this class:
class X {
public:
virtual void myfunc();
};
If you call the virtual function for a plain object of type X, the compiler will generate a direct call, i.e. refer directly to X::myfunct():
X a; // object of known type
a.myfunc(); // will call X::myfunc() directly
If you'd call the virtual function via a pointer dereference, or a reference, it is not clear which type the object pointed to will really have. It could be X but it could also be a type derived from X. Then the compiler will make a virtual call, i.e. use a table of pointers to the function address:
X *pa; // pointer to a polymorphic object
... // initialise the pointer to point to an X or a derived class from X
pa->myfunc(); // will call the myfunc() that is related to the real type of object pointed to
Here you have an online simulation of the code. You'll see that in the first case, the generated assembly calls the address of the function, whereas in the second case, the compiler loads something in a register and make an indirect call using this register (i.e. the called address is not "hard-wired" and will be determined dynamically at run time).
The following both call the function T::f on object t.
1. t.f();
2. t.T::f();
3. (t.*&T::f)();
I've seen the second one used where the other was not. What is the difference between these two and in what situation should one be preferred?
Thanks.
EDIT: Sorry, I forgot about the second one t.T::f(). I added that one in.
The call t.f() and (t.*&T::f)() are semantically identical and are the "normal" way to call a member functions. The call t.T::f() calls T::f() even if f() is an overridden virtual function.
The expression (t.*&T::f)() calls a member function by obtaining a pointer to a member function. The only potential effect that this expression could have is that it might inhibit inlining the function for some compilers. Using a variable obtained from &T::f to call a member function would be a customization point but directly calling the function is merely obfuscation and potentially a pessimization (if the compiler isn't capable to do sufficient const propagation to detect that the address can't change).
What I could imagine is that someone tried to inhibit a virtual function call on t. Of course, this doesn't work this way because the pointer to member will still call the correct virtual function. To inhibit virtual dispatch you'd use t.T::f().
You should prefer t.f() over (t.*&T::f)(). If you want to inhibit virtual dispatch you'd use t.T::f() otherwise you'd use t.f(). The primary use for inhibiting virtual dispatch is to call the base class version of a function from within an overriding function. Otherwise it is rarely useful.
The first one is the regular one, that's the one you should prefer. The second one takes a member function pointer to f, dereferences it for t, and then calls it.
If there is an actual benefit in that extra trip I am not aware of it. This is the member version of *&f() when calling a free function.
The one that you added later on, t.T::f(), is statically dispatched so that the f of T is called even if it were virtual and t were a derived class of T with its own implementation. It effectively inhibits the virtual call mechanism.
The second one is pointless, it's just obfuscated code. No it doesn't disable virtual dispatch nor inlining. They both do the exact same thing, in each and every case.
In one of the C++ tutorials in internet, i found out the below description on why a constructor cannot be virtual
We cannot declare a virtual constructor. We should specify the exact
type of the object at compile time, so that the compiler can allocate
memory for that specific type.
Is this description correct ?
I am getting confused particularly with the phrase: so that the compiler can allocate
memory for that specific type.
As Bjarne himself explains in his C++ Style and Technique FAQ:
A virtual call is a mechanism to get work done given partial information. In particular, "virtual" allows us to call a function knowing only an interfaces and not the exact type of the object. To create an object you need complete information. In particular, you need to know the exact type of what you want to create. Consequently, a "call to a constructor" cannot be virtual.
The constructor cannot be virtual because the standard says so.
The standard says so because it wouldn't make sense. What would a virtual constructor do?
Virtual methods are used in polymorphism... how should polymorphism work if you don't even have the objects yet?
We should specify the exact type of the object at compile time, so
that the compiler can allocate memory for that specific type.
We should specify the exact type at compile time because we want an object of that type... I found their description very confusing too.
Also, in the paragraph it doesn't say this is the reason why constructors can't be virtual. It explains why virtual methods shouldn't be called from the constructor, but that's about it.
How would a constructor be able to be virtual? virtual means that the result to a call to that function is determined by the dynamic type of the object. Before construction, there is no object to do this.
The way the tutorial phrases, what a constructor is, is also bogus. You need to specify the exact type, otherwise the thing you declare wont be considered a constructor and functions without a return type are not allowed.
Just to add to what already been said, there is virtual constructor design pattern, also known as factory method or factory function:
... it deals with the problem of creating objects (products) without specifying the exact class of object that will be created
It is correct, even though it misses the point in my humble opinion.
Constructors set up the virtual dispatching, i.e. point the right pointers at functions of the current class. If constructors could be virtual, who would set up the virtual constructor beforehand? There would be a horrible chicken-and-egg problem.
There is, however, an idiom named "virtual constructor", in which a static member of the class returns a base class pointer with a suitable class:
class A {
static A* create();
virtual ~A();
};
class B : public A { ... };
A* A::create() { return new B(); }
How does a C++ object know where it's member function definitions are present? I am quite confused as the Object itself does not contain the function pointers.
sizeof on the Object proves this.
So how is the object to function mapping done by the Runtime environment? where is a class's member function-pointer table maintained?
If you're calling non-virtual functions, there's no need for a function-pointer table; the compiler can resolve the function addresses at compile-time. So:
A a;
a.func();
translates to something along the lines of:
A a;
A_func(&a);
Calling a virtual function through a base-class pointer typically uses a vtable. So:
A *p_a = new B();
p_a->func();
translates to something along the lines of:
A *p_a = new B();
p_a->p_vtbl->func(p_a);
where p_vtbl is a compiler-implemented pointer to the vtable specific to the actual class of *p_a.
There are generally two ways that an object and its member functions are associated:
For a non-virtual function, the compiler determines the appropriate function at compile time. Non-static member functions are usually passed a hidden parameter that contains the this pointer, which takes care of the association of the object and the class member function.
For virtual functions, most compilers tend to use a lookup table that is usually referenced via the object's this pointer or a similar mechanism. This table, normally called the vtable, contains the function pointer for the virtual functions only.
As C++ is not a dynamic language, the compiler can do most of the object/function/symbol resolution at compile time with the exception of some virtual functions. In some cases, it's even possible for the compiler to determine exactly which instance of a virtual function gets called and skip the resolution via the vtable.
Member functions are not part of the object - they are defined statically, in one place, just like any other function. There is no magic look-up needed.
Virtual functions are different, but I don't think your question is about that...
For non-virtual functions there is one (global, per-class) function table which all instances use. Since it's the same for all of them - deterministic at compile-time - you would not want it duplicated in each instance.
For virtual functions, resolution is done at runtime and the object will contain a function table for them. Try that and look at your object again.