vtable and polymorphism - offset of a function - c++

If I understand things correctly, a class definition imposes a certain order of the virtual functions in the vtable, and so a given function is known to be at a certain offset from the beginning of the table. However, I don't understand how that works with polymorphism.
class B1 {
virtual void funcB1();
};
class B2 {
virtual void funcB2() {}
};
class D : public B1, public B2 {
virtual void funcB1() {}
virtual void funcB2() {}
};
void main(...) {
B1 *b1 = new D();
B2 *b2 = new D();
B1 *realB1 = new B1();
B2 *realB2 = new B2();
b1->funcB1();
b2->funcB2();
realB1->funcB1();
realB2->funcB2();
}
How does the generated code know how to access funcB2 at different offsets?

When you compose a class from two base classes, each part is represented in the resultant class by a fully functioning block, complete with its own pointer to vtable. That is how the generated code knows what function to call: casting the pointer of D to B1 and B2 produces different pointers, so the generated code can use the same offset into the virtual table.
D *d = new D();
B1 *b1 = dynamic_cast<B1*>(d);
B2 *b2 = dynamic_cast<B2*>(d);
printf("%p %p %p", (void*)d, (void*)b1, (void*)b2);
This produces the following output on ideone:
0x91c7008 0x91c7008 0x91c700c
Note how D* and B1* print the same value, while B2* prints a different value. When you call b2->funcB2(), the pointer b2 already points to a different part of the D object, which points to a different vtable (the one that has the layout of B2), so the generated code does not need to do anything differently for b2 vs realB2 in your example.

Typically the D object will have two vtable pointers, one for each base class. It really can't be avoided, since it must contain an identical binary layout for each of the base classes. The compiler will insert pointer fixups whenever you cast from one type to another - if you print the pointer addresses after casting to each of the base classes, you'll see that they are different.

Related

How does C++ multiple inheritance virtual function access derived class field?

Referencing the multiple inheritance memory layout, suppose Derived class has a field called int derived_only. If I have a Base1 * b1 and Base2 * b2, both pointing to the same Derived class object, then according to wiki, b1 and b2 have slightly different values due to pointer fixup. My question is, if I call a virtual function, say the virtual clone(), by using either b1 or b2, how does clone() calculate derived_only's address, from either b1 or b2?
Basically, when calling b1->clone() vs b2->clone(), the this pointer passed in is different, then how does clone() know how much offset to add to this to get to derived_only?

In a diamond inheritance structure, is there a way to cast between the branches?

I have a diamond inheritance structure in my code in which I have a pointer to the
bottom object. I tried to case this to a pointer to the left of the two diamond sides, cast it again to the top of the diamond, and again to the right side. But apparently, C++ kind of remembers the order of casting and things don't work as expected. Example code:
#include <iostream>
class A
{
};
class B1 : public A
{
public:
virtual int Return1() = 0;
};
class B2 : public A
{
public:
virtual int Return2() = 0;
};
class C : public B1, public B2
{
public:
virtual int Return1() { return 1; }
virtual int Return2() { return 2; }
};
int main()
{
C c;
B1* b1 = &c;
A* a = b1;
B2* b2 = (B2*)a;
std::cout << "Return2() = " << b2->Return2();
}
This results in Return2() = 1, so apparently this approach is wrong. I know that something like this works in C#, so my question would be: Is there a way in C++ to do what I'm attempting here or - if not - why is this not an option?
As inheritance is not virtual (for A), you have "Y" inheritance (2 A),
A A
| |
B1 B2
\ /
C
not a diamond (1 A).
Avoid C-cast which might result in reinterpret_cast, and most reinterpret_cast usage leads to Undefined Behavior (UB).
You might use dynamic_cast in your case to have expected behavior (A need to be polymorphic for that, default virtual destructor does the job):
class A
{
public:
virtual ~A() = default; // Added to allow dynamic_cast
};
class B1 : public A
{
public:
virtual int Return1() = 0;
};
class B2 : public A
{
public:
virtual int Return2() = 0;
};
class C : public B1, public B2
{
public:
// override used for extra check from compiler.
int Return1() override { return 1; }
int Return2() override { return 2; }
};
int main()
{
C c;
B1* b1 = &c;
A* a = b1;
B2* b2 = dynamic_cast<B2*>(a); // C-cast replaced by dynamic_cast
assert(b2 != nullptr);
std::cout << "Return2() = " << b2->Return2();
}
Demo
You can see the desired result by changing the last casting to:
B2* b2 = (B2*)&c;
This is an issue related to upcasting and downcasting.
The issue here is that you are using a C cast (T) expr, which is 99% of the time a bad idea in C++.
C casts only exist in C++ due to the need of being retrocompatible with C, and can behave in unexpected ways.
From here:
When the C-style cast expression is encountered, the compiler attempts to interpret it as the following cast expressions, in this order:
a) const_cast<new_type>(expression);
b) static_cast<new_type>(expression), with extensions: pointer or reference to a derived class is additionally allowed to be cast to pointer or reference to unambiguous base class (and vice versa) even if the base class is inaccessible (that is, this cast ignores the private inheritance specifier). Same applies to casting pointer to member to pointer to member of unambiguous non-virtual base;
c) static_cast (with extensions) followed by const_cast;
d) reinterpret_cast<new_type>(expression);
e) reinterpret_cast followed by const_cast.
The first choice that satisfies the requirements of the respective cast operator is selected, even if it cannot be compiled (see example)
The correct type of cast when downcasting in C++ is dynamic_cast<T>(expr), which checks if the object of the expression can be cast to the derived type T before performing it. If you did that, you would have got a compile time or runtime error, instead of getting a wrong behaviour.
C-style casts never perform dynamic casts, so (B2*) in B2* b2 = (B2*)a becomes equivalent to reinterpret_cast<B2*> which is a type of cast that blindly coerces any pointer type to any other. In this way C++ can't do any of the required pointer "magic" it's usually needed to convert a C* into a valid B2*.
Given that polymorphism in C++ is implemented through virtual dispatching using method tables, and that the pointer in b2 doesn't point to the correct base class (given that it was actually a pointer to B1), you are accessing the vtable for B1 instead of B2 through b2.
Both Return1 and Return2 are the first functions in the vtables of their respective abstract classes, so in your case Return1 is mistakenly called - you could largely approximate virtual invocations with something like b2->vtable[0]() in most implementations. Given that neither of the two methods touch this, nothing breaks and the function returns without crashing the program (which is not guaranteed, given this whole thing is undefined behaviour).

Subclass address equal to virtual base class address?

We all know that when using simple single inheritance, the address of a derived class is the same as the address of the base class. Multiple inheritance makes that untrue.
Does virtual inheritance also make that untrue? In other words, is the following code correct:
struct A {};
struct B : virtual A
{
int i;
};
int main()
{
A* a = new B; // implicit upcast
B* b = reinterpret_cast<B*>(a); // fishy?
b->i = 0;
return 0;
}
We all know that when using simple single inheritance, the address of
a derived class is the same as the address of the base class.
I think the claim is not true. In the below code, we have a simple (not virtual) single (non multiple) inheritance, but the addresses are different.
class A
{
public:
int getX()
{
return 0;
}
};
class B : public A
{
public:
virtual int getY()
{
return 0;
}
};
int main()
{
B b;
B* pB = &b;
A* pA = static_cast<A*>(pB);
std::cout << "The address of pA is: " << pA << std::endl;
std::cout << "The address of pB is: " << pB << std::endl;
return 0;
}
and the output for VS2015 is:
The address of pA is: 006FF8F0
The address of pB is: 006FF8EC
Does virtual inheritance also make that untrue?
If you change the inheritance in the above code into virtual, the result will be the same. so, even in the case of virtual inheritance, the addresses of base and derived objects can be different.
The result of reinterpret_cast<B*>(a); is only guaranteed to point to the enclosing B object of a if the a subobject and the enclosing B object are pointer-interconvertible, see [expr.static.cast]/3 of the C++17 standard.
The derived class object is pointer-interconvertible with the base class object only if the derived object is standard-layout, does not have direct non-static data members and the base class object is its first base class subobject. [basic.compound]/4.3
Having a virtual base class disqualifies a class from being standard-layout. [class]/7.2.
Therefore, because B has a virtual base class and a non-static data member, b will not point to the enclosing B object, but instead b's pointer value will remain unchanged from a's.
Accessing the i member as if it was pointing to the B object then has undefined behavior.
Any other guarantees would come from your specific ABI or other specification.
Multiple inheritance makes that untrue.
That is not entirely correct. Consider this example:
struct A {};
struct B : A {};
struct C : A {};
struct D : B, C {};
When creating an instance of D, B and C are instantiated each with their respective instance of A. However, there would be no problem if the instance of D had the same address of its instance of B and its respective instance of A. Although not required, this is exactly what happens when compiling with clang 11 and gcc 10:
D: 0x7fffe08b4758 // address of instance of D
B: 0x7fffe08b4758 and A: 0x7fffe08b4758 // same address for B and A
C: 0x7fffe08b4760 and A: 0x7fffe08b4760 // other address for C and A
Does virtual inheritance also make that untrue
Let's consider a modified version of the above example:
struct A {};
struct B : virtual A {};
struct C : virtual A {};
struct D : B, C {};
Using the virtual function specifier is typically used to avoid ambiguous function calls. Therefore, when using virtual inheritance, both B and C instances must create a common A instance. When instantiating D, we get the following addresses:
D: 0x7ffc164eefd0
B: 0x7ffc164eefd0 and A: 0x7ffc164eefd0 // again, address of A and B = address of D
C: 0x7ffc164eefd8 and A: 0x7ffc164eefd0 // A has the same address as before (common instance)
Is the following code correct
There is no reason here to use reinterpret_cast, even more, it results in undefined behavior. Use static_cast instead:
A* pA = static_cast<A*>(pB);
Both casts behave differently in this example. The reinterpret_cast will reinterpret pB as a pointer to A, but the pointer pA may point to a different address, as in the above example (C vs A). The pointer will be upcasted correctly if you use static_cast.
The reason a and b are different in your case is because, since A is not having any virtual method, A is not maintaining a vtable. On the other hand, B does maintain a vtable.
When you upcast to A, the compiler is smart enough to skip the vtable meant for B. And hence the difference in addresses. You should not reinterpret_cast back to B, it wouldn't work.
To verify my claim, try adding a virtual method, say virtual void foo() {} in class A. Now A will also maintain a vtable. Thus downcast(reinterpret_cast) to B will give you back the original b.

Convert array of pointers of derived class to array of base class pointers

Consider an inheritance hierarchy like this:
A
/ \
B1 B2
\ /
C
|
D
Realized in C++ like so:
class A {
public:
A() {};
virtual ~A() = 0;
double a;
};
A::~A() {};
class B1 : virtual public A {
public:
B1() {}
virtual ~B1() {}
double b1;
};
class B2 : virtual public A {
public:
B2() {}
virtual ~B2() {}
double b2;
};
class C : public B1, public B2 {
public:
C() {}
virtual ~C() {}
double c;
};
class D : public C {
public:
D() {}
virtual ~D() {}
double d;
};
Now, obviously I can do something like this:
D *d = new D();
A *a = (A*) d;
D *d_down = dynamic_cast<D*>(a);
assert(d_down != NULL); //holds
However, I can't seem to figure out how to get same behavior using arrays. Please consider the following code sample to see what I mean by that:
D *d[10];
for (unsigned int i = 0; i < 10; i++) {
d[i] = new D();
}
A **a = (A**) d;
D *d_down = dynamic_cast<D*>(a[0]);
assert(d_down != NULL); //fails!
So my questions would be:
Why does to above assertion fail?
How can I achieve the desired behavior?
I noticed, by chance, that the dynamic_cast above works if I remove the double fields from classes A through D. Why is that?
The problem is, that (A*)d is not numerically equal to d!
See, you have an object like
+---------------------+
| A: vtable ptr A | <----- (A*)d points here!
| double a |
+---------------------+
+---------------------+
| D: | <----- d points here (and so do (C*)d and (B1*)d)!
|+-------------------+|
|| C: ||
||+-----------------+||
||| B1: vptr B1,C,D |||
||| double b1 |||
||+-----------------+||
||+-----------------+|| <----- (B2*)d points here!
||| B2: vptr B2 |||
||| double b2 |||
||+-----------------+||
|| double c ||
|+-------------------+|
| double d |
+---------------------+
When you cast a D* to A*, via static_cast or dynamic_cast, the compiler will inject the necessary arithmetic for you.
But when you cast it via reinterpret_cast, or cast a D** to A**, which is the same thing, the pointer will keep its numeric value, because the cast does not give the compiler the right to dereference the first layer to adjust the second layer.
But then the pointer will still point at D's vtable, not A's vtable, and therefore won't be recognized as A.
Update: I checked the layout in compiler (g++) and the picture should now reflect the actual layout generated in the case in question. It shows that virtual bases live at negative offsets. This is because a virtual base is at different offset depending on the actual type, so it can't be part of the object itself.
The address of object does coincide with address of the first non-virtual base. However, the specification does not guarantee it for objects with virtual methods or bases, so don't rely on it either.
This shows the importance of using appropriate casts. Conversions that can be done implicitly, via static_cast, dynamic_cast or function-style cast are reliable and compiler will inject appropriate adjustments.
However using reinterpret_cast clearly indicates the compiler will not adjust and you are on your own.
A *a = static_cast<A *>(d);
is ok, but
A **aa = static_cast<A **>(&d);
is a compilation error.
The problem with C-style cast is that it does a static_cast when possible and reinterpret_cast otherwise, so you can cross the border to the undefined behavior land without noticing. That's why you shouldn't use C-style cast in C++. Ever.
Note that due to aliasing rules, writing reinterpret_cast essentially always implies Undefined Behavior. And at least GCC does optimize based on aliasing rules. The only exception is cv-(signed/unsigned) char *, which is exempt from strict aliasing. But it only ever makes sense to cast to and from pointers to standard layout types, because you can't rely on layout of objects with bases (any, not just virtual) and/or virtual members.

When is static cast safe when you are using multiple inheritance?

I found myself in a situation where I know what type something is. The Type is one of three (or more) levels of inheritance. I call factory which returns B* however T is either the highest level of a type (if my code knows what it is) or the 2nd level.
Anyways, I did a static_cast in the template which is the wrong thing to do. My question is WHEN can I static cast safely? Is there ever such a time? I did it in this case because I'd rather get compile errors when I accidentally have T as something wacky which (has happened and) dynamic cast ignores (and returns null). However when I know the correct type the pointer is not adjusted causing me to have a bad pointer. I'm not sure why static cast is allowed in this case at all.
When can I use static_cast for down casting safely? Is there ever a situation? Now it seems like it always is wrong to use a static_cast (when the purpose is to down cast)
Ok I figured out how to reproduce it.
#include <iostream>
struct B { virtual void f1(){} };
struct D1 : B {int a;};
struct D2 : B {int a, b; };
struct DD : D1, D2 {};
int main(){
void* cptr = new DD(); //i pass it through a C interface :(
B* a = (B*)cptr;
D2* b = static_cast<D2*>(a); //incorrect ptr
D2* c = dynamic_cast<D2*>(a); //correct ptr
std::cout << a << " " <<b << " " <<c;
}
A cross-cast:
struct Base1 { virtual void f1(); };
struct Base2 { virtual void f2(); };
struct Derived : Base1, Base2 {};
Base1* b1 = new Derived();
Base2* b2 = dynamic_cast<Base2*>(b1);
requires use of dynamic_cast, it cannot be done with static_cast (static_cast should have caused a compile-time error). dynamic_cast will also fail if either base class is not polymorphic (the presence of virtual functions is NOT optional).
See this explanation on MSDN
If Derived has Base as a public (or otherwise accessible) base class, and d is of type Derived*, then static_cast<Base*>(d) is an upcast.
This is always technically safe.
And generally unnecessary, except for cases where you have hiding (shadowing) of method.
Cheers & hth.,
The problem lies with this line:
B* a = (B*)cptr;
If you convert something to a void pointer, you must convert it back to the same type that it was converted from first before doing any other casts. If you have a situation where multiple different types of objects have to go through the same void pointer, then you need to first cast it down to a common type before converting to a void pointer.
int main(){
B *bptr = new DD; // convert to common base first (won't compile in this case)
void* cptr = bptr; // now pass it around as a void pointer
B* a = (B*)cptr; // now back to the type it was converted from
D2* b = static_cast<D2*>(a); // this should be ok now
D2* c = dynamic_cast<D2*>(a); // as well as this
std::cout << a << " " <<b << " " <<c;
}
EDIT:
If you only know that cptr points to some object which is of a type derived from B at the time of the cast, then that isn't enough information to go on. The compiler lets you know that when you try to convert the DD pointer to a B pointer.
What you would have to do is something like this:
int main(){
void* cptr = new DD; // convert to void *
DD* a = (DD*)cptr; // now back to the type it was converted from
D2* b = static_cast<D2*>(a); // this should be ok now, but the cast is unnecessary
D2* c = dynamic_cast<D2*>(a); // as well as this
std::cout << a << " " <<b << " " <<c;
}
but I'm not sure if that is acceptable in your actual usage.
You can safely upcast if you are sure that the object is actually an instance of that class.
class Base {};
class Derived1 : public Base {};
class Derived2 : public Base {};
int main()
{
Base* b = new Derived1;
Derived1* d1 = static_cast<Derived1*>(b); // OK
Derived2* d2 = static_cast<Derived2*>(b); // Run-time error - d isn't an instance of Derived2
}
Just for completeness (knowing that I'm late a little, just for late readers like me...):
static_cast can be applied, if used correctly!
At first, the simple case:
struct D1 { }; // note: no common base class B!
struct D2 { };
struct DD : D1, D2 { };
You can get from D1* to D2* via intermediate downcast to DD*:
D1* d1 = new DD();
D2* d2 = static_cast<DD*>(d1);
The upcast to D2* is implicit then. This is possible even for non-virtual inheritance. But be aware that you need to be 100% sure that d1 really was created as DD when doing the downcast, otherwise you end up in undefined behaviour!
Now the more complex case: Diamond pattern! This is what is presented in the question:
void* cptr = new DD();
B* a = (B*)cptr;
Now this cast is already is dangerous! What actually is implemented here is a reinterpret_cast:
B* a = reinterpret_cast<B*>(cptr);
What you instead want is a simple upcast. Normally, one would not need a cast at all:
B* a = new DD(); //ambigous!
Solely: DD has two inherited instances of B. It would have worked if both D1 and D2 inherited virtually from B (struct D1/2 : virtual B { }; – not to be confused with B/D1/D2 being virtual classes!).
B* b1 = static_cast<D1*>(new DD());
B* b2 = static_cast<D2*>(new DD());
The cast to the respective bases D1 or D2 now makes clear which of the two inherited instances of B shall be pointed to.
You now can get back the respective other instance by down-casting to DD again; due to the diamond pattern, you need an intermediate cast again:
D2* bb1 = static_cast<DD*>(static_cast<D1*>(b1));
D1* bb2 = static_cast<DD*>(static_cast<D2*>(b2));
The very important point about all this matter is: You absolutely need to use, when down-casting, the same diamond edge you used for up-casting!!!
Conclusion: Yes, it is possible using static casts, and it is the only option if the classes involved are not virtual (note: to be differed from virtual inheritance!). But it is just too easy to fail in doing it correctly, sometimes even impossible (e. g. if having stored pointers of base type to arbitrary derived types in a std::vector), so usually, the dynamic_cast solution as presented by Ben is the much safer one (provided virtual data types are available; if so, in the fore-mentioned vector example, it is the only solution!).
A cross cast doesn't need a dynamic_cast at all..
struct Base1 { virtual void f1(); };
struct Base2 { virtual void f2(); };
struct Derived : Base1, Base2 {};
Base1* b1 = new Derived();
// To cast it to a base2 *, cast it first to a derived *
Derived *d = static_cast<Derived *>(b1);
Base2 *b2 = static_cast<Base2 *>(d);