I've searched for questions, looking at forums, books, etc. I can recognize a polymorphic behavior of methods, and there are lots of simple examples when an invoked method is decided in compile or runtime. But I was confused by this code, where a class C inherits from B that inherits from A:
class A {
protected:
int x;
public:
virtual void change() = 0;
virtual void change(int a) { x = a; }
};
class B : public A {
public:
void change() { x = 1; }
};
class C : public B {
public:
void change() { x = 2; }
void change(int a) { x = a*2; }
};
int main () {
B *objb = new B();
C *objc = new C();
A *obja;
objb->change();
obja = objc;
objc->change();
obja->change(5);
// ...
}
Many examples tells (and it is clear) that a polymorphic behavior occurs and it is decided in runtime what method to call when the following line is executed:
obja->change(5);
But my questions are:
What happens when I call the following (overrided from a pure virtual)?
objb->change();
What happens when I call the following (overrided from a virtual, but non-pure)?
objc->change(5);
Since the class declaration of the pointer variables are the same of the objects, should the method calling be decided at compile or runtime?
If the compiler can deduce the actual type at compile time, it can avoid the virtual function dispatch. But it can only do this when it can prove that the behavior is equivalent to a run-time dispatch. Whether this happens depends on how smart your particular compiler is.
The real question is, why do you care? You obviously understand the rules for calling virtual functions, and the semantics of the behavior are always those of a run-time dispatch, so it should make no difference to how you write your code.
There are three issues to consider. The first is overload resolution:
in this case, the compiler uses the static type of the expression to
construct the set of functions it chooses from. Thus, if you had
written:
objb->change( 2 );
the code wouldn't have compiled, because there is no change which
takes an int in the scope of B. Had there been no change at all
in the scope of B, the compiler would have looked further, and found
the change (all of them) in A, but once it finds the name, it stops.
This is name lookup and function overload resolution, and it is entirely
static.
The second issue is which function should be called, once the compiler
has chosen to call a specific function in the interface. If the chosen
function is virtual, the actual function called will be the function
with the exact same signature in the most derived class of the dynamic
type—that is, the type of the actual object in question.
Finally, there is the question of whether dynamic dispatch is used in
the generated code. And that's entirely up to the compiler. The
compiler can do anything it wants, as long as the correct function, as
determined by the two preceding issues, is called. Generally: if the
function isn't virtual, dynamic dispatch will never be used; and if the
access is directly to the object (named object or temporary), dynamic
dispatch will generally not be used, since the compiler can trivially
know the most derived type. When the call is through a reference or a
pointer, the compiler will generally use dynamic dispatch, but it is
sometimes possible for the compiler to track the pointer enough to know
the type it will point to at runtime, and forego dynamic dispatch. And
good compilers will often go further, using profiler information, to
determine that 99% of the time, the same function will be called, and
the call is in a tight loop, and will generate two versions of the loop,
one with dynamic dispatch, and one with the most frequently called
function inlined, and select which version of the loop via an if, at
runtime.
objb->change()
calls B::change() because objb contains address of an object of type B
objc->change(5);
calls C::change(int) because objc contains address of an object of type C
The method calling will be still dynamic/run time because the methods B::change() & C::change(int) are still virtual, because the virtual attribute in inherited.
To answer the question of whether the functions are dynamically dispatched or at compile time
The answer is No can definitely say whether it will be a compile time dispatch or dynamic dispatch. The dynamic/run time dispatch takes place in the first place because the compiler cannot definitely decide which versions of the functions to call at compile time, So if a compiler can deduce a definite manner as to which function to call, the dispatch might very well be decided at compile time itself.
Having said so, whether the dispatch happens at run time or compile time does not change the semantics of calling which version of overidden function gets finally called because the C++ standard explicitly states the rules of which functions of functions in this regard.
Related
How to identify whether vptr will be used to invoke a virtual function?
Consider the below hierarchy:
class A
{
int n;
public:
virtual void funcA()
{std::cout <<"A::funcA()" << std::endl;}
};
class B: public A
{
public:
virtual void funcB()
{std::cout <<"B::funcB()" << std::endl;}
};
A* obj = new B();
obj->funcB(); //1. this does not even compile
typedef void (*fB)();
fB* func;
int* vptr = (int*)obj; //2. Accessing the vptr
func = (fB*)(*vptr);
func[1](); //3. Calling funcB using vptr.
Statement 1. i.e. obj->funcB(); does not even compile although Vtable has an entry for funcB where as on accessing vPtr indirectly funcB() can be invoked successfully.
How does compiler decide when to use the vTable to invoke a function?
In the statement A* obj = new B(); since I am using a base class pointer so I believe vtable should be used to invoke the function.
Below is the memory layout when vptr is accessed indirectly.
So there are two answers to your question:
The short one is:
obj->FuncB() is only a legal call, if the static type of obj (in this case A) has a function FuncB with the appropriate signature (either directly or due to a base class). Only if that is the case, the compiler decides whether it translates it to a direct or dynamic function call (e.g. using a vtable), based on whether FuncB is declared virtual or not in the declaration of A (or its base type).
The longer one is this:
When the compiler sees obj->funcB() it has no way of knowing (optimizations aside), what the runtime type of obj is and especially it doesn't know, whether a derived class that implements funcB() exists, at all. obj might e.g. be created in another translation unit or it might be a function parameter.
And no, that information is usually not stored in the virtual function table:
The vtable is just an array of addresses and without the prior knowledge that a specific addess corresponds to a function called funcB, the compiler can't use it to implement the call obj->funcB()- or to be more precise: it is not allowed to do so by the standard. That prior knowledge can only be provided by a virtual function declaration in the static type of obj (or its base classes).
The reason, why you have that information available in the debugger (whose behavior lys outside of the standard anyway) is, because it has access to the debugging symbols, which are usually not part of the distributed release binary. Storing that information in the vtable by default, would be a waste of memory and performance, as the program isn't allowed to make use of it in standard c++ in the way you describe anyway. For extensions like C++/CLI that might be a different story.
Adding to Barry's comment, adding the line virtual void funcB() = 0; to class A seems to fix the problem.
(C++,MinGW 4.4.0,Windows OS)
All that is commented in the code, except labels <1> and <2>, is my guess. Please correct me in case you think I'm wrong somewhere:
class A {
public:
virtual void disp(); //not necessary to define as placeholder in vtable entry will be
//overwritten when derived class's vtable entry is prepared after
//invoking Base ctor (unless we do new A instead of new B in main() below)
};
class B :public A {
public:
B() : x(100) {}
void disp() {std::printf("%d",x);}
int x;
};
int main() {
A* aptr=new B; //memory model and vtable of B (say vtbl_B) is assigned to aptr
aptr->disp(); //<1> no error
std::printf("%d",aptr->x); //<2> error -> A knows nothing about x
}
<2> is an error and is obvious. Why <1> is not an error? What I think is happening for this invocation is: aptr->disp(); --> (*aptr->*(vtbl_B + offset to disp))(aptr) aptr in the parameter being the implicit this pointer to the member function. Inside disp() we would have std::printf("%d",x); --> std::printf("%d",aptr->x); SAME AS std::printf("%d",this->x); So why does <1> give no error while <2> does?
(I know vtables are implementation specific and stuff but I still think it's worth asking the question)
this is not the same as aptr inside B::disp. The B::disp implementation takes this as B*, just like any other method of B. When you invoke virtual method via A* pointer, it is converted to B* first (which may even change its value so it is not necessarily equal to aptr during the call).
I.e. what really happens is something like
typedef void (A::*disp_fn_t)();
disp_fn_t methodPtr = aptr->vtable[index_of_disp]; // methodPtr == &B::disp
B* b = static_cast<B*>(aptr);
(b->*methodPtr)(); // same as b->disp()
For more complicated example, check this post http://blogs.msdn.com/b/oldnewthing/archive/2004/02/06/68695.aspx. Here, if there are multiple A bases which may invoke the same B::disp, MSVC generates different entry points with each one shifting A* pointer by different offset. This is implementation-specific, of course; other compilers may choose to store the offset somewhere in vtable for example.
The rule is:
In C++ dynamic dispatch only works for member functions functions not for member variables.
For a member variable the compiler only looksup for the symbol name in that particular class or its base classes.
In case 1, the appropriate method to be called is decided by fetching the vpt, fetching the address of the appropriate method and then calling the appropiate member function.
Thus dynamic dispatch is essentially a fetch-fetch-call instead of a normal call in case of static binding.
In Case 2: The compiler only looks for x in the scope of this Obviously, it cannot find it and reports the error.
You are confused, and it seems to me that you come from more dynamic languages.
In C++, compilation and runtime are clearly isolated. A program must first be compiled and then can be run (and any of those steps may fail).
So, going backward:
<2> fails at compilation, because compilation is about static information. aptr is of type A*, thus all methods and attributes of A are accessible through this pointer. Since you declared disp() but no x, then the call to disp() compiles but there is no x.
Therefore, <2>'s failure is about semantics, and those are defined in the C++ Standard.
Getting to <1>, it works because there is a declaration of disp() in A. This guarantees the existence of the function (I would remark that you actually lie here, because you did not defined it in A).
What happens at runtime is semantically defined by the C++ Standard, but the Standard provides no implementation guidance. Most (if not all) C++ compilers will use a virtual table per class + virtual pointer per instance strategy, and your description looks correct in this case.
However this is pure runtime implementation, and the fact that it runs does not retroactively impact the fact that the program compiled.
virtual void disp(); //not necessary to define as placeholder in vtable entry will be
//overwritten when derived class's vtable entry is prepared after
//invoking Base ctor (unless we do new A instead of new B in main() below)
Your comment is not strictly correct. A virtual function is odr-used unless it is pure (the converse does not necessarily hold) which means that you must provide a definition for it. If you don't want to provide a definition for it you must make it a pure virtual function.
If you make one of these modifications then aptr->disp(); works and calls the derived class disp() because disp() in the derived class overrides the base class function. The base class function still has to exist as you are calling it through a pointer to base. x is not a member of the base class so aptr->x is not a valid expression.
struct B
{
virtual void bar () {}
virtual void foo () { bar(); }
};
struct D : B
{
virtual void bar () {}
virtual void foo () {}
};
Now we call foo() using an object of B as,
B obj;
obj.foo(); // calls B::bar()
Question:
Should bar() will be resolved through virtual dispatch or it will be resolved using the static type of the object (i.e. B).
EDIT: I think I misunderstood your question. I'm pretty sure it depends on how smart the compiler's optimizer is. A naive implementation would of course still go through a virtual lookup. The only way to know for sure for a particular implementation is to compile the code and look at the disassembly to see if it's smart enough to make the direct call.
Original answer:
It will be virtually dispatched. This is more obvious when you consider that within a class method, a method call works out to something like this->bar();, making it obvious that a pointer is used to call the method, allowing to use the dynamic object type.
However in your example since you created a B it will of course call B's version of the method.
Do note (as seen in a comment) that virtual dispatch doesn't happen inside constructors even using the implicit this->.
EDIT2 for your update:
That's not right at all. Calls within B::foo() cannot be generally bound statically (unless due to inlining the compiler knows the static type of the object). Just because it knows that it's being called on a B* says nothing about the real type of the object in question - it could be a D* and need virtual dispatch.
It must be a virtual call. The code that you're compiling cannot know if there's not a more-derived class that it actually is that has overridden the other function.
Note that this assumes that you're compiling these separately. If the compiler inlines the call to foo() (due to its static type being known) it'll also inline the call to bar().
Answer: from the language point of view call to bar() inside B::foo() is resolved through virtual dispatch.
The bottom line is that from the point of view of C++ language, virtual dispatch always happens when you call a virtual method using its non-qualified name. When you use a qualified name of the method, the virtual dispatch does not happen. That means that the only way to suppress virtual dispatch in C++ is to use this syntax
some_object_ptr->SomeClass::some_method();
some_object.SomeClass::some_method();
In this case the dynamic type if the object on the left-hand side is ignored and the specific method is called directly.
In all other cases virtual dispatch does happen, as far as the language is concerned. I.e. the call is resolved in accordance with the dynamic type of the object. In other words, from the formal point of view, every time you call a virtual method through an immediate object, as in
B obj;
obj.foo();
the method is called through the "virtual dispatch" mechanism, regardless of context ("within a virtual method" or not - doesn't matter).
That's how it is in C++ language. Everything else is just optimizations made by compilers. As you probably know, most (if not all) compilers will generate a non-virtual call to a virtual method, when the call is performed through an immediate object. This is, of course, an obvious optimization, since the compiler knows that the static type of the object is the same as its dynamic type. Again, it doesn't depend on the context ("within a virtual method" or not - doesn't matter).
Inside a virtual method, the call can be made without specifying an object on the left hand side (as in your example), which really implies all calls have this-> implicitly present on the left. The very same rules apply in this case as well. If you just call bar(), it stands for this->bar() and the call is dispatched virtually. If you call B::bar(), it stands for this->B::bar() and the call is dispatched non-virtually. Everything else will only depend on the optimization capabilities of the compiler.
What you are trying to say by "because, once you are inside B::foo(), it's sure that this is of type B* and not D*" is totally unclear to me. This statement misses the point. Virtual dispatch depends on the dynamic type of the object. Note: it depends on the type of *this, not on the type of this. It doesn't matter at all what the type of this is. What matters is the dynamic type of *this. When you are inside B::foo, it is still perfectly possible that the dynamic type of *this is D or something else. For which reason, the call to bar() has to be resolved dynamically.
Entirely up to the implementation. Nominally it's a virtual call, but you're not entitled to assume that the emitted code will actually perform an indirection through a vtable or similar.
If foo() is called on some arbitrary B*, then of course the code emitted for foo() needs to make a virtual call to bar(), since the referand might belong to a derived class.
This isn't an arbitrary B*, this is an object of dynamic type B. The result of a virtual or non-virtual call is exactly the same, so the compiler can do what it likes ("as-if" rule), and a conforming program can't tell the difference.
Specifically in this case, if the call to foo is inlined, then I'd have thought that the optimizer has every chance of de-virtualizing the call to bar inside it, since it knows exactly what's in the vtable (or equivalent) of obj. If the call isn't inlined, then it's going to use the "vanilla" code of foo(), which of course will need to do some kind of indirection since it's the same code used when the call is made on an arbitrary B*.
In this case:
B obj;
obj.foo(); // calls B::bar()
the compiler can optimize away the virtual dispatch since it knows that type of the actual object is B.
However, inside of B::foo() the call to bar() needs to use virtual dispatch generally (though the compiler might be able to inline the call and for that particular call instance could possibly optimize the virtual dispatch away again). Specifically, this statement you proposed:
Irrespective of foo() is called using object or pointer/reference, all the calls inside any virtual B::foo() should be statically resolved. Because, once you are inside B::foo(), it's sure that this is of type B* and not D*
is not true.
Consider:
struct D2 : B
{
// D2 does not override bar()
virtual void foo () {
cout << "hello from D2::bar()" << endl;
}
};
Now if you had the following somewhere:
D2 test;
B& bref = test;
bref.foo();
That call to foo() would end up in B::foo(), but when B::foo() calls bar(), it needs to dispatch to D2::bar().
Actually, now that I've typed this out, the B& is completely unnecessary for this example.
My library has two classes, a base class and a derived class. In the current version of the library the base class has a virtual function foo(), and the derived class does not override it. In the next version I'd like the derived class to override it. Does this break ABI? I know that introducing a new virtual function usually does, but this seems like a special case. My intuition is that it should be changing an offset in the vtbl, without actually changing the table's size.
Obviously since the C++ standard doesn't mandate a particular ABI this question is somewhat platform specific, but in practice what breaks and maintains ABI is similar across most compilers. I'm interested in GCC's behavior, but the more compilers people can answer for the more useful this question will be ;)
It might.
You're wrong regarding the offset. The offset in the vtable is determined already. What will happen is that the Derived class constructor will replace the function pointer at that offset with the Derived override (by switching the in-class v-pointer to a new v-table). So it is, normally, ABI compatible.
There might be an issue though, because of optimization, and especially the devirtualization of function calls.
Normally, when you call a virtual function, the compiler introduces a lookup in the vtable via the vpointer. However, if it can deduce (statically) what the exact type of the object is, it can also deduce the exact function to call and shave off the virtual lookup.
Example:
struct Base {
virtual void foo();
virtual void bar();
};
struct Derived: Base {
virtual void foo();
};
int main(int argc, char* argv[]) {
Derived d;
d.foo(); // It is necessarily Derived::foo
d.bar(); // It is necessarily Base::bar
}
And in this case... simply linking with your new library will not pick up Derived::bar.
This doesn't seem like something that could be particularly relied on in general - as you said C++ ABI is pretty tricky (even down to compiler options).
That said I think you could use g++ -fdump-class-hierarchy before and after you made the change to see if either the parent or child vtables change in structure. If they don't it's probably "fairly" safe to assume you didn't break ABI.
Yes, in some situations, adding a reimplementation of a virtual function will change the layout of the virtual function table. That is the case if you're reimplementing a virtual function from a base that isn't the first base class (multiple-inheritance):
// V1
struct A { virtual void f(); };
struct B { virtual void g(); };
struct C : A, B { virtual void h(); }; //does not reimplement f or g;
// V2
struct C : A, B {
virtual void h();
virtual void g(); //added reimplementation of g()
};
This changes the layout of C's vtable by adding an entry for g() (thanks to "Gof" for bringing this to my attention in the first place, as a comment in http://marcmutz.wordpress.com/2010/07/25/bcsc-gotcha-reimplementing-a-virtual-function/).
Also, as mentioned elsewhere, you get a problem if the class you're overriding the function in is used by users of your library in a way where the static type is equal to the dynamic type. This can be the case after you new'ed it:
MyClass * c = new MyClass;
c->myVirtualFunction(); // not actually virtual at runtime
or created it on the stack:
MyClass c;
c.myVirtualFunction(); // not actually virtual at runtime
The reason for this is an optimisation called "de-virtualisation". If the compiler can prove, at compile time, what the dynamic type of the object is, it will not emit the indirection through the virtual function table, but instead call the correct function directly.
Now, if users compiled against an old version of you library, the compiler will have inserted a call to the most-derived reimplementation of the virtual method. If, in a newer version of your library, you override this virtual function in a more-derived class, code compiled against the old library will still call the old function, whereas new code or code where the compiler could not prove the dynamic type of the object at compile time, will go through the virtual function table. So, a given instance of the class may be confronted, at runtime, with calls to the base class' function that it cannot intercept, potentially creating violations of class invariants.
My intuition is that it should be changing an offset in the vtbl, without actually changing the table's size.
Well, your intuition is clearly wrong:
either there is a new entry in the vtable for the overrider, all following entries are moved, and the table grows,
or there is no new entry, and the vtable representation does not change.
Which one is true can depends on many factors.
Anyway: do not count on it.
Caution: see In C++, does overriding an existing virtual function break ABI? for a case where this logic doesn't hold true;
In my mind Mark's suggestion to use g++ -fdump-class-hierarchy would be the winner here, right after having proper regression tests
Overriding things should not change vtable layout[1]. The vtable entries itself would be in the datasegment of the library, IMHO, so a change to it should not pose a problem.
Of course, the applications need to be relinked, otherwise there is a potential for breakage if the consumer had been using direct reference to &Derived::overriddenMethod;
I'm not sure whether a compiler would have been allowed to resolve that to &Base::overriddenMethod at all, but better safe than sorry.
[1] spelling it out: this presumes that the method was virtual to begin with!
My question is not about calling a virtual member function from a base class constructor, but whether the pointer to a virtual member function is valid in the base class constructor.
Given the following
class A
{
void (A::*m_pMember)();
public:
A() :
m_pMember(&A::vmember)
{
}
virtual void vmember()
{
printf("In A::vmember()\n");
}
void test()
{
(this->*m_pMember)();
}
};
class B : public A
{
public:
virtual void vmember()
{
printf("In B::vmember()\n");
}
};
int main()
{
B b;
b.test();
return 0;
}
Will this produce "In B::vmember()" for all compliant c++ compilers?
The pointer is valid, however you have to keep in mind that when a virtual function is invoked through a pointer it is always resolved in accordance with the dynamic type of the object used on the left-hand side. This means that when you invoke a virtual function from the constructor, it doesn't matter whether you invoke it directly or whether you invoke it through a pointer. In both cases the call will resolve to the type whose constructor is currently working. That's how virtual functions work, when you invoke them during object construction (or destruction).
Note also that pointers to member functions are generally not attached to specific functions at the point of initalization. If the target function is non-virtual, they one can say that the pointer points to a specific function. However, if the target function is virtual, there's no way to say where the pointer is pointing to. For example, the language specification explicitly states that when you compare (for equality) two pointers that happen to point to virtual functions, the result is unspecified.
"Valid" is a specific term when applied to pointers. Data pointers are valid when they point to an object or NULL; function pointers are valid when they point to a function or NULL, and pointers to members are valid when the point to a member or NULL.
However, from your question about actual output, I can infer that you wanted to ask something else. Let's look at your vmember function - or should I say functions? Obviously there are two function bodies. You could have made only the derived one virtual, so that too confirms that there are really two vmember functions, who both happen to be virtual.
Now, the question becomes whether when taking the address of a member function already chooses the actual function. Your implementations show that they don't, and that this only happens when the pointer is actually dereferenced.
The reason it must work this way is trivial. Taking the address of a member function does not involve an actual object, something that would be needed to resolve the virtual call. Let me show you:
namespace {
void (A::*test)() = &A::vmember;
A a;
B b;
(a.*test)();
(b.*test)();
}
When we initialize test, there is no object of type A or B at all, yet is it possible to take the address of &A::vmember. That same member pointer can then be used with two different objects. What could this produce but "In A::vmember()\n" and "In B::vmember()\n" ?
Read this article for an in-depth discussion of member function pointers and how to use them. This should answer all your questions.
I have found a little explanation on the Old New Thing (a blog by Raymond Chen, sometimes referred to as Microsoft's Chuck Norris).
Of course it says nothing about the compliance, but it explains why:
B b;
b.A::vmember(); // [1]
(b.*&A::vmember)(); // [2]
1 and 2 actually invoke a different function... which is quite surprising, really. It also means that you can't actually prevent the runtime dispatch using a pointer to member function :/
I think no. Pointer to virtual member function is resolved via VMT, so the same way as call to this function would happen. It means that it is not valid, since VMT is populated after constructor finished.
IMO it is implementation defined to take address of a virtual function. This is because virtual functions are implemented using vtables which are compiler implementation specific. Since the vtable is not guaranteed to be complete until the execution of the class ctor is done, a pointer to an entry in such a table (virtual function) may be implementation defined behavior.
There is a somewhat related question that I asked on SO here few months back; which basically says taking address of the virtual function is not specified in the C++ standard.
So, in any case even if it works for you, the solution will not be portable.