C++ object size with virtual methods - c++

I have some questions about the object size with virtual.
1) virtual function
class A {
public:
int a;
virtual void v();
}
The size of class A is 8bytes....one integer(4 bytes) plus one virtual pointer(4 bytes)
It's clear!
class B: public A{
public:
int b;
virtual void w();
}
What's the size of class B? I tested using sizeof B, it prints
12
Does it mean that only one vptr is there even both of class B and class A have virtual function? Why there is only one vptr?
class A {
public:
int a;
virtual void v();
};
class B {
public:
int b;
virtual void w();
};
class C : public A, public B {
public:
int c;
virtual void x();
};
The sizeof C is 20........
It seems that in this case, two vptrs are in the layout.....How does this happen? I think the two vptrs one is for class A and another is for class B....so there is no vptr for the virtual function of class C?
My question is, what's the rule about the number of vptrs in inheritance?
2) virtual inheritance
class A {
public:
int a;
virtual void v();
};
class B: virtual public A{ //virtual inheritance
public:
int b;
virtual void w();
};
class C : public A { //non-virtual inheritance
public:
int c;
virtual void x();
};
class D: public B, public C {
public:
int d;
virtual void y();
};
The sizeof A is 8 bytes -------------- 4(int a) + 4 (vptr) = 8
The sizeof B is 16 bytes -------------- Without virtual it should be 4 + 4 + 4 = 12. why there is another 4 bytes here? What's the layout of class B ?
The sizeof C is 12 bytes. -------------- 4 + 4 + 4 = 12. It's clear!
The sizeof D is 32 bytes -------------- it should be 16(class B) + 12(class C) + 4(int d) = 32. Is that right?
class A {
public:
int a;
virtual void v();
};
class B: virtual public A{ //virtual inheritance here
public:
int b;
virtual void w();
};
class C : virtual public A { //virtual inheritance here
public:
int c;
virtual void x();
};
class D: public B, public C {
public:
int d;
virtual void y();
};
sizeof A is 8
sizeof B is 16
sizeof C is 16
sizeof D is 28 Does it mean 28 = 16(class B) + 16(class C) - 8(class A) + 4 ( what's this? )
My question is , why there is an extra space when virtual inheritance is applied?
What's the underneath rule for the object size in this case?
What's the difference when virtual is applied on all the base classes and on part of the base classes?

This is all implementation defined. I'm using VC10 Beta2. The key to help understanding this stuff (the implementation of virtual functions), you need to know about a secret switch in the Visual Studio compiler, /d1reportSingleClassLayoutXXX. I'll get to that in a second.
The basic rule is the vtable needs to be located at offset 0 for any pointer to an object. This implies multiple vtables for multiple inheritance.
Couple questions here, I'll start at the top:
Does it mean that only one vptr is there even both of class B and class A have virtual function? Why there is only one vptr?
This is how virtual functions work, you want the base class and derived class to share the same vtable pointer (pointing to the implementation in the derived class.
It seems that in this case, two vptrs are in the layout.....How does this happen? I think the two vptrs one is for class A and another is for class B....so there is no vptr for the virtual function of class C?
This is the layout of class C, as reported by /d1reportSingleClassLayoutC:
class C size(20):
+---
| +--- (base class A)
0 | | {vfptr}
4 | | a
| +---
| +--- (base class B)
8 | | {vfptr}
12 | | b
| +---
16 | c
+---
You are correct, there are two vtables, one for each base class. This is how it works in multiple inheritance; if the C* is casted to a B*, the pointer value gets adjusted by 8 bytes. A vtable still needs to be at offset 0 for virtual function calls to work.
The vtable in the above layout for class A is treated as class C's vtable (when called through a C*).
The sizeof B is 16 bytes -------------- Without virtual it should be 4 + 4 + 4 = 12. why there is another 4 bytes here? What's the layout of class B ?
This is the layout of class B in this example:
class B size(20):
+---
0 | {vfptr}
4 | {vbptr}
8 | b
+---
+--- (virtual base A)
12 | {vfptr}
16 | a
+---
As you can see, there is an extra pointer to handle virtual inheritance. Virtual inheritance is complicated.
The sizeof D is 32 bytes -------------- it should be 16(class B) + 12(class C) + 4(int d) = 32. Is that right?
No, 36 bytes. Same deal with the virtual inheritance. Layout of D in this example:
class D size(36):
+---
| +--- (base class B)
0 | | {vfptr}
4 | | {vbptr}
8 | | b
| +---
| +--- (base class C)
| | +--- (base class A)
12 | | | {vfptr}
16 | | | a
| | +---
20 | | c
| +---
24 | d
+---
+--- (virtual base A)
28 | {vfptr}
32 | a
+---
My question is , why there is an extra space when virtual inheritance is applied?
Virtual base class pointer, it's complicated. Base classes are "combined" in virtual inheritance. Instead of having a base class embedded into a class, the class will have a pointer to the base class object in the layout. If you have two base classes using virtual inheritance (the "diamond" class hierarchy), they will both point to the same virtual base class in the object, instead of having a separate copy of that base class.
What's the underneath rule for the object size in this case?
Important point; there are no rules: the compiler can do whatever it needs to do.
And a final detail; to make all these class layout diagrams I am compiling with:
cl test.cpp /d1reportSingleClassLayoutXXX
Where XXX is a substring match of the structs/classes you want to see the layout of. Using this you can explore the affects of various inheritance schemes yourself, as well as why/where padding is added, etc.

Quote> My question is, what's the rule about the number of vptrs in inheritance?
There are no rulez, every compiler vendor is allowed to implement the semantics of inheritance the way he sees fit.
class B: public A {}, size = 12. That's pretty normal, one vtable for B that has both virtual methods, vtable pointer + 2*int = 12
class C : public A, public B {}, size = 20. C can arbitrarily extend the vtable of either A or B. 2*vtable pointer + 3*int = 20
Virtual inheritance: that's where you really hit the edges of undocumented behavior. For example, in MSVC the #pragma vtordisp and /vd compile options become relevant. There's some background info in this article. I studied this a few times and decided the compile option acronym was representative for what could happen to my code if I ever used it.

A good way to think about it is to understand what has to be done to handle up-casts. I'll try to answer your questions by showing the memory layout of objects of the classes you describe.
Code sample #2
The memory layout is as follows:
vptr | A::a | B::b
Upcasting a pointer to B to type A will result in the same address, with the same vptr being used. This is why there's no need for additional vptr's here.
Code sample #3
vptr | A::a | vptr | B::b | C::c
As you can see, there are two vptr's here, just like you guessed. Why? Because it's true that if we upcast from C to A we don't need to modify the address, and thus can use the same vptr. But if we upcast from C to B we do need that modification, and correspondingly we need a vptr at the start of the resulting object.
So, any inherited class beyond the first will require an additional vptr (unless that inherited class has no virtual methods, in which case it has no vptr).
Code sample #4 and beyond
When you derive virtually, you need a new pointer, called a base pointer, to point to the location in the memory layout of the derived classes. There can be more than one base pointer, of course.
So how does the memory layout look? That depends on the compiler. In your compiler it's probably something like
vptr | base pointer | B::b | vptr | A::a | C::c | vptr | A::a
\-----------------------------------------^
But other compilers may incorporate base pointers in the virtual table (by using offsets - that deserves another question).
You need a base pointer because when you derive in a virtual fashion, the derived class will appear only once in the memory layout (it may appear additional times if it's also derived normally, as in your example), so all its children must point to the exact same location.
EDIT: clarification - it all really depends on the compiler, the memory layout I showed can be different in different compilers.

All of this is completely implementation defined you realize. You can't count on any of it. There is no 'rule'.
In the inheritance example, here is how the virtual table for classes A and B might look:
class A
+-----------------+
| pointer to A::v |
+-----------------+
class B
+-----------------+
| pointer to A::v |
+-----------------+
| pointer to B::w |
+-----------------+
As you can see, if you have a pointer to class B's virtual table, it is also perfectly valid as class A's virtual table.
In your class C example, if you think about it, there is no way to make a virtual table that is both valid as a table for class C, class A, and class B. So the compiler makes two. One virtual table is valid for class A and C (mostly likely) and the other is valid for class A and B.

This obviously depends on the compiler implementation.
Anyway I think that I can sum up the following rules from the implementation given by a classic paper linked below and which gives the number of bytes you get in your examples (except for class D which would be 36 bytes and not 32!!!):
The size of an object of class T is:
The size of its fields PLUS the sum of the size of every object from which T inherits PLUS 4 bytes for every object from which T virtually inherits PLUS 4 bytes ONLY IF T needs ANOTHER v-table
Pay attention: if a class K is virtually inherited multiple times (at any level) you have to add the size of K only once
So we have to answer another question: When does a class need ANOTHER v-table?
A class that does not inherit from other classes needs a v-table only if it has one or more virtual methods
OTHERWISE, a class needs another v-table ONLY IF NONE of the classes from which it non virtually inherits does have a v-table
The End of the rules (which I think can be applied to match what Terry Mahaffey has explained in his answer) :)
Anyway my suggestion is to read the following paper by Bjarne Stroustrup (the creator of C++) which explains exactly these things: how many virtual tables are needed with virtual or non virtual inheritance... and why!
It's really a good reading:
http://www.hpc.unimelb.edu.au/nec/g1af05e/chap5.html

I am not sure but I think that it is because of pointer to Virtual method table

Related

Pointer casting offset of a class with single inheritance

While learning the "Effective C++", I was firstly surprised when I learned the fact that if a class had multiple inheritance, its pointer may take offset when the pointer casting is done. Although it was not easy concept to grasp, but I think I managed to get it.
However, the author claims that this offset might happen even in the pointer casting of singly inherited class. I wonder what would be the such case, and wish to know the rationale behind it.
This can happen when polymorphism is introduced into a class hierarchy by a derived class. Consider the following class:
struct Foo
{
int a;
int b;
};
This class is not polymorphic, and thus the implementation does not need to include a pointer to a virtual dispatch table (a commonly-used method of implementing virtual dispatch). It will be laid out in memory like this:
Foo
+---+
a | |
+---+
b | |
+---+
Now consider a class that inherits from Foo:
struct Bar : Foo
{
virtual ~Bar() = default;
};
This class is polymorphic, and so objects of this class need to include a pointer to a vtable so further derived classes can override Bar's virtual member functions. That means that Bar objects will be laid out in memory like this:
Bar
+---------+
vtable pointer | |
+---------+
Foo subobject | +---+ |
| a | | |
| +---+ |
| b | | |
| +---+ |
+---------+
Since the object's Foo subobject is not at the beginning of the object, any Foo* initialized from a pointer to a Bar object will need to be adjusted by the size of a pointer so that it actually points at the Bar object's Foo subobject.
Live Demo
class B {
int a = 0;
};
class D : public B {
virtual ~D() = default;
};
D has a virtual member. B does not. A common implementation of dynamic dispatch in C++ involves keeping a hidden pointer to a table of function addresses at the begining of the object.
This means the first byte of the B sub-object won't be at the start of the complete D object. A pointer cast would need to adjust the address by the vptr size.

Why does virtual inheritance need a vtable even if no virtual functions are involved?

I read this question: C++ Virtual class inheritance object size issue, and was wondering why virtual inheritance results in an additional vtable pointer in the class.
I found an article here: https://en.wikipedia.org/wiki/Virtual_inheritance
which tells us:
However this offset can in the general case only be known at runtime,...
I don't get what is runtime-related here. The complete class inheritance hierarchy is already known at compile time. I understand virtual functions and the use of a base pointer, but there is no such thing with virtual inheritance.
Can someone explain why some compilers (Clang/GCC) implement virtual inheritance with a vtable and how this is used during runtime?
BTW, I also saw this question: vtable in case of virtual inheritance, but it only points to answers related to virtual functions, which is not my question.
The complete class inheritance hierarchy is already known in compile time.
True enough; so if the compiler knows the type of a most derived object, then it knows the offset of every subobject within that object. For such a purpose, a vtable is not needed.
For example, if B and C both virtually derive from A, and D derives from both B and C, then in the following code:
D d;
A* a = &d;
the conversion from D* to A* is, at most, adding a static offset to the address.
However, now consider this situation:
A* f(B* b) { return b; }
A* g(C* c) { return c; }
Here, f must be able to accept a pointer to any B object, including a B object that may be a subobject of a D object or of some other most derived class object. When compiling f, the compiler doesn't know the full set of derived classes of B.
If the B object is a most derived object, then the A subobject will be located at a certain offset. But what if the B object is part of a D object? The D object only contains one A object and it can't be located at its usual offsets from both the B and C subobjects. So the compiler has to pick a location for the A subobject of D, and then it has to provide a mechanism so that some code with a B* or C* can find out where the A subobject is. This depends solely on the inheritance hierarchy of the most derived type---so a vptr/vtable is an appropriate mechanism.
However this offset can in the general case only be known at runtime,...
I can't get the point, what is runtime related here. The complete class inheritance hierarchy is already known in compile time.
The linked article at Wikipedia provides a good explanation with examples, I think.
The example code from that article:
struct Animal {
virtual ~Animal() = default;
virtual void Eat() {}
};
// Two classes virtually inheriting Animal:
struct Mammal : virtual Animal {
virtual void Breathe() {}
};
struct WingedAnimal : virtual Animal {
virtual void Flap() {}
};
// A bat is still a winged mammal
struct Bat : Mammal, WingedAnimal {
};
When you careate an object of type Bat, there are various ways a compiler may choose the object layout.
Option 1
+--------------+
| Animal |
+--------------+
| vpointer |
| Mammal |
+--------------+
| vpointer |
| WingedAnimal |
+--------------+
| vpointer |
| Bat |
+--------------+
Option 2
+--------------+
| vpointer |
| Mammal |
+--------------+
| vpointer |
| WingedAnimal |
+--------------+
| vpointer |
| Bat |
+--------------+
| Animal |
+--------------+
The values contained in vpointer in Mammal and WingedAnimal define the offsets to the Animal sub-object. Those values cannot be known until run time because the constructor of Mammal cannot know whether the subject is Bat or some other object. If the sub-object is Monkey, it won't derive from WingedAnimal. It will be just
struct Monkey : Mammal {
};
in which case, the object layout could be:
+--------------+
| vpointer |
| Mammal |
+--------------+
| vpointer |
| Monkey |
+--------------+
| Animal |
+--------------+
As can be seen, the offset from the Mammal sub-object to the Animal sub-object is defined by the classes derived from Mammal. Hence, it can be defined only at runtime.
The complete class inheritance hierarchy is already known at compiler time. But all the vptr related operations, such as to get the offsets to virtual base class and issue the virtual function call, are delayed until runtime, because only at runtime can we know the actual type of the object.
For example,
class A() { virtual bool a() { return false; } };
class B() : public virtual A { int a() { return 0; } };
B* ptr = new B();
// assuming function a()'s index is 2 at virtual function table
// the call
ptr->a();
// will be transformed by the compiler to (*ptr->vptr[2])(ptr)
// so a right call to a() will be issued according to the type of the object ptr points to

object size of class which is inherited from virtual base classes? [duplicate]

I have some questions about the object size with virtual.
1) virtual function
class A {
public:
int a;
virtual void v();
}
The size of class A is 8bytes....one integer(4 bytes) plus one virtual pointer(4 bytes)
It's clear!
class B: public A{
public:
int b;
virtual void w();
}
What's the size of class B? I tested using sizeof B, it prints
12
Does it mean that only one vptr is there even both of class B and class A have virtual function? Why there is only one vptr?
class A {
public:
int a;
virtual void v();
};
class B {
public:
int b;
virtual void w();
};
class C : public A, public B {
public:
int c;
virtual void x();
};
The sizeof C is 20........
It seems that in this case, two vptrs are in the layout.....How does this happen? I think the two vptrs one is for class A and another is for class B....so there is no vptr for the virtual function of class C?
My question is, what's the rule about the number of vptrs in inheritance?
2) virtual inheritance
class A {
public:
int a;
virtual void v();
};
class B: virtual public A{ //virtual inheritance
public:
int b;
virtual void w();
};
class C : public A { //non-virtual inheritance
public:
int c;
virtual void x();
};
class D: public B, public C {
public:
int d;
virtual void y();
};
The sizeof A is 8 bytes -------------- 4(int a) + 4 (vptr) = 8
The sizeof B is 16 bytes -------------- Without virtual it should be 4 + 4 + 4 = 12. why there is another 4 bytes here? What's the layout of class B ?
The sizeof C is 12 bytes. -------------- 4 + 4 + 4 = 12. It's clear!
The sizeof D is 32 bytes -------------- it should be 16(class B) + 12(class C) + 4(int d) = 32. Is that right?
class A {
public:
int a;
virtual void v();
};
class B: virtual public A{ //virtual inheritance here
public:
int b;
virtual void w();
};
class C : virtual public A { //virtual inheritance here
public:
int c;
virtual void x();
};
class D: public B, public C {
public:
int d;
virtual void y();
};
sizeof A is 8
sizeof B is 16
sizeof C is 16
sizeof D is 28 Does it mean 28 = 16(class B) + 16(class C) - 8(class A) + 4 ( what's this? )
My question is , why there is an extra space when virtual inheritance is applied?
What's the underneath rule for the object size in this case?
What's the difference when virtual is applied on all the base classes and on part of the base classes?
This is all implementation defined. I'm using VC10 Beta2. The key to help understanding this stuff (the implementation of virtual functions), you need to know about a secret switch in the Visual Studio compiler, /d1reportSingleClassLayoutXXX. I'll get to that in a second.
The basic rule is the vtable needs to be located at offset 0 for any pointer to an object. This implies multiple vtables for multiple inheritance.
Couple questions here, I'll start at the top:
Does it mean that only one vptr is there even both of class B and class A have virtual function? Why there is only one vptr?
This is how virtual functions work, you want the base class and derived class to share the same vtable pointer (pointing to the implementation in the derived class.
It seems that in this case, two vptrs are in the layout.....How does this happen? I think the two vptrs one is for class A and another is for class B....so there is no vptr for the virtual function of class C?
This is the layout of class C, as reported by /d1reportSingleClassLayoutC:
class C size(20):
+---
| +--- (base class A)
0 | | {vfptr}
4 | | a
| +---
| +--- (base class B)
8 | | {vfptr}
12 | | b
| +---
16 | c
+---
You are correct, there are two vtables, one for each base class. This is how it works in multiple inheritance; if the C* is casted to a B*, the pointer value gets adjusted by 8 bytes. A vtable still needs to be at offset 0 for virtual function calls to work.
The vtable in the above layout for class A is treated as class C's vtable (when called through a C*).
The sizeof B is 16 bytes -------------- Without virtual it should be 4 + 4 + 4 = 12. why there is another 4 bytes here? What's the layout of class B ?
This is the layout of class B in this example:
class B size(20):
+---
0 | {vfptr}
4 | {vbptr}
8 | b
+---
+--- (virtual base A)
12 | {vfptr}
16 | a
+---
As you can see, there is an extra pointer to handle virtual inheritance. Virtual inheritance is complicated.
The sizeof D is 32 bytes -------------- it should be 16(class B) + 12(class C) + 4(int d) = 32. Is that right?
No, 36 bytes. Same deal with the virtual inheritance. Layout of D in this example:
class D size(36):
+---
| +--- (base class B)
0 | | {vfptr}
4 | | {vbptr}
8 | | b
| +---
| +--- (base class C)
| | +--- (base class A)
12 | | | {vfptr}
16 | | | a
| | +---
20 | | c
| +---
24 | d
+---
+--- (virtual base A)
28 | {vfptr}
32 | a
+---
My question is , why there is an extra space when virtual inheritance is applied?
Virtual base class pointer, it's complicated. Base classes are "combined" in virtual inheritance. Instead of having a base class embedded into a class, the class will have a pointer to the base class object in the layout. If you have two base classes using virtual inheritance (the "diamond" class hierarchy), they will both point to the same virtual base class in the object, instead of having a separate copy of that base class.
What's the underneath rule for the object size in this case?
Important point; there are no rules: the compiler can do whatever it needs to do.
And a final detail; to make all these class layout diagrams I am compiling with:
cl test.cpp /d1reportSingleClassLayoutXXX
Where XXX is a substring match of the structs/classes you want to see the layout of. Using this you can explore the affects of various inheritance schemes yourself, as well as why/where padding is added, etc.
Quote> My question is, what's the rule about the number of vptrs in inheritance?
There are no rulez, every compiler vendor is allowed to implement the semantics of inheritance the way he sees fit.
class B: public A {}, size = 12. That's pretty normal, one vtable for B that has both virtual methods, vtable pointer + 2*int = 12
class C : public A, public B {}, size = 20. C can arbitrarily extend the vtable of either A or B. 2*vtable pointer + 3*int = 20
Virtual inheritance: that's where you really hit the edges of undocumented behavior. For example, in MSVC the #pragma vtordisp and /vd compile options become relevant. There's some background info in this article. I studied this a few times and decided the compile option acronym was representative for what could happen to my code if I ever used it.
A good way to think about it is to understand what has to be done to handle up-casts. I'll try to answer your questions by showing the memory layout of objects of the classes you describe.
Code sample #2
The memory layout is as follows:
vptr | A::a | B::b
Upcasting a pointer to B to type A will result in the same address, with the same vptr being used. This is why there's no need for additional vptr's here.
Code sample #3
vptr | A::a | vptr | B::b | C::c
As you can see, there are two vptr's here, just like you guessed. Why? Because it's true that if we upcast from C to A we don't need to modify the address, and thus can use the same vptr. But if we upcast from C to B we do need that modification, and correspondingly we need a vptr at the start of the resulting object.
So, any inherited class beyond the first will require an additional vptr (unless that inherited class has no virtual methods, in which case it has no vptr).
Code sample #4 and beyond
When you derive virtually, you need a new pointer, called a base pointer, to point to the location in the memory layout of the derived classes. There can be more than one base pointer, of course.
So how does the memory layout look? That depends on the compiler. In your compiler it's probably something like
vptr | base pointer | B::b | vptr | A::a | C::c | vptr | A::a
\-----------------------------------------^
But other compilers may incorporate base pointers in the virtual table (by using offsets - that deserves another question).
You need a base pointer because when you derive in a virtual fashion, the derived class will appear only once in the memory layout (it may appear additional times if it's also derived normally, as in your example), so all its children must point to the exact same location.
EDIT: clarification - it all really depends on the compiler, the memory layout I showed can be different in different compilers.
All of this is completely implementation defined you realize. You can't count on any of it. There is no 'rule'.
In the inheritance example, here is how the virtual table for classes A and B might look:
class A
+-----------------+
| pointer to A::v |
+-----------------+
class B
+-----------------+
| pointer to A::v |
+-----------------+
| pointer to B::w |
+-----------------+
As you can see, if you have a pointer to class B's virtual table, it is also perfectly valid as class A's virtual table.
In your class C example, if you think about it, there is no way to make a virtual table that is both valid as a table for class C, class A, and class B. So the compiler makes two. One virtual table is valid for class A and C (mostly likely) and the other is valid for class A and B.
This obviously depends on the compiler implementation.
Anyway I think that I can sum up the following rules from the implementation given by a classic paper linked below and which gives the number of bytes you get in your examples (except for class D which would be 36 bytes and not 32!!!):
The size of an object of class T is:
The size of its fields PLUS the sum of the size of every object from which T inherits PLUS 4 bytes for every object from which T virtually inherits PLUS 4 bytes ONLY IF T needs ANOTHER v-table
Pay attention: if a class K is virtually inherited multiple times (at any level) you have to add the size of K only once
So we have to answer another question: When does a class need ANOTHER v-table?
A class that does not inherit from other classes needs a v-table only if it has one or more virtual methods
OTHERWISE, a class needs another v-table ONLY IF NONE of the classes from which it non virtually inherits does have a v-table
The End of the rules (which I think can be applied to match what Terry Mahaffey has explained in his answer) :)
Anyway my suggestion is to read the following paper by Bjarne Stroustrup (the creator of C++) which explains exactly these things: how many virtual tables are needed with virtual or non virtual inheritance... and why!
It's really a good reading:
http://www.hpc.unimelb.edu.au/nec/g1af05e/chap5.html
I am not sure but I think that it is because of pointer to Virtual method table

The size of base class object and derived class object in C++

#include <stdio.h>
class Base1
{
public:
virtual int virt1() { return 100; }
int data1;
};
class Derived : public Base1
{
public:
virtual int virt1() { return 150; }
int derivedData;
};
int Global1( Base1 * b1 )
{
return b1->virt1();
}
main1()
{
Derived * d = new Derived;
printf( "%d %d\n", d->virt1(), Global1( d ));
printf("size: Base1:%d\n", sizeof(Base1));
printf("size: Derived:%d\n", sizeof(Derived));
}
I used the above code to print out the size of the base class and derived class. I am running the code in a 64 bit machine. The output from my computer is
150 150
size: Base1:16
size: Derived:16
I also tried to print out the size of int by using sizeof(int), it's 4.
I have the following questions:
For the size of Base1 class, it should contain a vptr points to the
vtable and an integer data1. Why its size is 16 while sizeof(int)
is 4 in my machine.
For the derived class, it should have the data inherited from Base1 class and an additional integer. It should have bigger size
than Base1 class, why they are the same?
Besides the size of these, what's the vptr in Derived class? Will the vptr in Derived class override the vptr inherited from the Base1
class?
For the size of Base1 class, it should contain a vptr points to the vtable and an integer data1. Why its size is 16 while sizeof(int) is 4 in my machine.
Padding to ensure proper alignment of the 8-byte pointers when Base1s are in a necessarily-contiguous† array.
† the C++ Standard requires that array elements be contiguous in memory, and calculates the memory address of an element at index i and the array's base address plus i times the element size
For the derived class, it should have the data inherited from Base1 class and an additional integer. It should have bigger size than Base1 class, why they are the same?
The padding's been reclaimed for the extra int member. You can observe this by outputting d, &d->data1, &d->derivedData.
Besides the size of these, what's the vptr in Derived class? Will the vptr in Derived class override the vptr inherited from the Base1 class?
In implementations using pointers to virtual dispatch tables (every compiler I know of, but it's not mandated by the C++ Standard), the Derived class constructor(s) and destructor overwrites the "vptr" - the former writing a pointer to Derived's VDT after the constructor body runs, the latter restoring the pointer to the Base VDT before the destructor body runs (this ensures you don't invoke Derived member functions on an object before or after its official lifetime).
There are many factors that decide the size of an object of a class in C++.
These factors are:
Size of all non-static data members
Order of data members
Byte alignment or byte padding
Size of its immediate base class
The existence of virtual function(s) (Dynamic polymorphism using virtual functions).
Compiler being used
Mode of inheritance (virtual inheritance)
When I compile your code on MVSC 2010, x64 bits. This is the result of memory layout:
class Base1 size(16):
+---
0 | {vfptr}
8 | data1
| <alignment member> (size=4)
+---
Base1::$vftable#:
| &Base1_meta
| 0
0 | &Base1::virt1
Base1::virt1 this adjustor: 0
class Derived size(24):
+---
| +--- (base class Base1)
0 | | {vfptr}
8 | | data1
| | <alignment member> (size=4)
| +---
16 | derivedData
| <alignment member> (size=4)
+---
Derived::$vftable#:
| &Derived_meta
| 0
0 | &Derived::virt1
You should read more about vTable, virtual function, byte alignment of class/struct.
Here, some links can help you. Hope it helpful to you:
Determine the size of class object: http://www.cprogramming.com/tutorial/size_of_class_object.html
Memory layout: http://www.phpcompiler.org/articles/virtualinheritance.html

understanding vptr in multiple inheritance?

I am trying to make sense of the statement in book effective c++. Following is the inheritance diagram for multiple inheritance.
Now the book says separate memory in each class is required for vptr. Also it makes following statement
An oddity in the above diagram is that there are only three vptrs even though four classes are involved. Implementations are free to generate four vptrs if they like, but three suffice (it turns out that B and D can share a vptr), and most implementations take advantage of this opportunity to reduce the compiler-generated overhead.
I could not see any reason why there is requirement of separate memory in each class for vptr. I had an understanding that vptr is inherited from base class whatever may be the inheritance type. If we assume that it shown resultant memory structure with inherited vptr how can they make the statement that
B and D can share a vptr
Can somebody please clarify a bit about vptr in multiple inheritance?
Do we need separate vptr in each class ?
Also if above is true why B and D can share vptr ?
Your question is interesting, however I fear that you are aiming too big as a first question, so I will answer in several steps, if you don't mind :)
Disclaimer: I am no compiler writer, and though I have certainly studied the subject, my word should be taken with caution. There will me inaccuracies. And I am not that well versed in RTTI. Also, since this is not standard, what I describe are possibilities.
1. How to implement inheritance ?
Note: I will leave out alignment issues, they just mean that some padding could be included between the blocks
Let's leave it out virtual methods, for now, and concentrate on how inheritance is implemented, down below.
The truth is that inheritance and composition share a lot:
struct B { int t; int u; };
struct C { B b; int v; int w; };
struct D: B { int v; int w; };
Are going to look like:
B:
+-----+-----+
| t | u |
+-----+-----+
C:
+-----+-----+-----+-----+
| B | v | w |
+-----+-----+-----+-----+
D:
+-----+-----+-----+-----+
| B | v | w |
+-----+-----+-----+-----+
Shocking isn't it :) ?
This means, however, than multiple inheritance is quite simple to figure out:
struct A { int r; int s; };
struct M: A, B { int v; int w; };
M:
+-----+-----+-----+-----+-----+-----+
| A | B | v | w |
+-----+-----+-----+-----+-----+-----+
Using these diagrams, let's see what happens when casting a derived pointer to a base pointer:
M* pm = new M();
A* pa = pm; // points to the A subpart of M
B* pb = pm; // points to the B subpart of M
Using our previous diagram:
M:
+-----+-----+-----+-----+-----+-----+
| A | B | v | w |
+-----+-----+-----+-----+-----+-----+
^ ^
pm pb
pa
The fact that the address of pb is slightly different from that of pm is handled through pointer arithmetic automatically for you by the compiler.
2. How to implement virtual inheritance ?
Virtual inheritance is tricky: you need to ensure that a single V (for virtual) object will be shared by all the other subobjects. Let's define a simple diamond inheritance.
struct V { int t; };
struct B: virtual V { int u; };
struct C: virtual V { int v; };
struct D: B, C { int w; };
I'll leave out the representation, and concentrate on ensuring that in a D object, both the B and C subparts share the same subobject. How can it be done ?
Remember that a class size should be constant
Remember that when designed, neither B nor C can foresee whether they will be used together or not
The solution that has been found is therefore simple: B and C only reserve space for a pointer to V, and:
if you build a stand-alone B, the constructor will allocate a V on the heap, which will be handled automatically
if you build B as part of a D, the B subpart will expect the D constructor to pass the pointer to the location of V
And idem for C, obviously.
In D, an optimization allow the constructor to reserve space for V right in the object, because D does not inherit virtually from either B or C, giving the diagram you have shown (though we don't have yet virtual methods).
B: (and C is similar)
+-----+-----+
| V* | u |
+-----+-----+
D:
+-----+-----+-----+-----+-----+-----+
| B | C | w | A |
+-----+-----+-----+-----+-----+-----+
Remark now that casting from B to A is slightly trickier than simple pointer arithmetic: you need follow the pointer in B rather than simple pointer arithmetic.
There is a worse case though, up-casting. If I give you a pointer to A how do you know how to get back to B ?
In this case, the magic is performed by dynamic_cast, but this require some support (ie, information) stored somewhere. This is the so called RTTI (Run-Time Type Information). dynamic_cast will first determine that A is part of a D through some magic, then query D's runtime information to know where within D the B subobject is stored.
If we were in case where there is no B subobject, it would either return 0 (pointer form) or throw a bad_cast exception (reference form).
3. How to implement virtual methods ?
In general virtual methods are implemented through a v-table (ie, a table of pointer to functions) per class, and v-ptr to this table per-object. This is not the sole possible implementation, and it has been demonstrated that others could be faster, however it is both simple and with a predictable overhead (both in term of memory and dispatch speed).
If we take a simple base class object, with a virtual method:
struct B { virtual foo(); };
For the computer, there is no such things as member methods, so in fact you have:
struct B { VTable* vptr; };
void Bfoo(B* b);
struct BVTable { RTTI* rtti; void (*foo)(B*); };
When you derive from B:
struct D: B { virtual foo(); virtual bar(); };
You now have two virtual methods, one overrides B::foo, the other is brand new. The computer representation is akin to:
struct D { VTable* vptr; }; // single table, even for two methods
void Dfoo(D* d); void Dbar(D* d);
struct DVTable { RTTI* rtti; void (*foo)(D*); void (*foo)(B*); };
Note how BVTable and DVTable are so similar (since we put foo before bar) ? It's important!
D* d = /**/;
B* b = d; // noop, no needfor arithmetic
b->foo();
Let's translate the call to foo in machine language (somewhat):
// 1. get the vptr
void* vptr = b; // noop, it's stored at the first byte of B
// 2. get the pointer to foo function
void (*foo)(B*) = vptr[1]; // 0 is for RTTI
// 3. apply foo
(*foo)(b);
Those vptrs are initialized by the constructors of the objects, when executing the constructor of D, here is what happened:
D::D() calls B::B() first and foremost, to initiliaze its subparts
B::B() initialize vptr to point to its vtable, then returns
D::D() initialize vptr to point to its vtable, overriding B's
Therefore, vptr here pointed to D's vtable, and thus the foo applied was D's. For B it was completely transparent.
Here B and D share the same vptr!
4. Virtual tables in multi-inheritance
Unfortunately this sharing is not always possible.
First, as we have seen, in the case of virtual inheritance, the "shared" item is positionned oddly in the final complete object. It therefore has its own vptr. That's 1.
Second, in case of multi-inheritance, the first base is aligned with the complete object, but the second base cannot be (they both need space for their data), therefore it cannot share its vptr. That's 2.
Third, the first base is aligned with the complete object, thus offering us the same layout that in the case of simple inheritance (the same optimization opportunity). That's 3.
Quite simple, no ?
If a class has virtual members, one need to way to find their address. Those are collected in a constant table (the vtbl) whose address is stored in an hidden field for each object (vptr). A call to a virtual member is essentially:
obj->_vptr[member_idx](obj, params...);
A derived class which add virtual members to his base class also need a place for them. Thus a new vtbl and a new vptr for them. A call to an inherited virtual member is still
obj->_vptr[member_idx](obj, params...);
and a call to new virtual member is:
obj->_vptr2[member_idx](obj, params...);
If the base is not virtual, one can arrange for the second vtbl to be put immediately after the first one, effectively increasing the size of the vtbl. And the _vptr2 is no more needed. A call to a new virtual member is thus:
obj->_vptr[member_idx+num_inherited_members](obj, params...);
In the case of (non virtual) multiple inheritance, one inherit two vtbl and two vptr. They can't be merged, and calls must pay attention to add an offset to the object (in order for the inherited data members to be found at the correct place). Calls to the first base class members will be
obj->_vptr_base1[member_idx](obj, params...);
and for the second
obj->_vptr_base2[member_idx](obj+offset, params...);
New virtual members can again either be put in a new vtbl, or appended to the vtbl of the first base (so that no offsets are added in future calls).
If a base is virtual, one can not append the new vtbl to the inherited one as it could leads to conflicts (in the example you gave, if both B and C append their virtual functions, how D be able to build its version?).
Thus, A needs a vtbl. B and C need a vtbl and it can't be appended to A's one because A is a virtual base of both. D needs a vtbl but it can be appended to B one as B is not a virtual base class of D.
It all has to do with how compiler figures out the actual addresses of method functions. The compiler assumes that virtual table pointer is located at a known offset from the base of the object (typically at offset 0). The compiler also needs to know the structure of the virtual table for each class - in other words, how to lookup pointers to functions in the virtual table.
Class B and class C will have completely different structures of Virtual Tables since they have different methods. Virtual table for class D can look like a virtual table for class B followed by additional data for methods of class C.
When you generate an object of class D, you can cast it as a pointer to B or as a pointer to C or even as a pointer to class A. You may pass these pointers to modules that are not even aware of existence of class D, but can call methods of class B or C or A. These modules need to know how to locate the pointer to the virtual table of the class and they need to know how to locate pointers to methods of class B/C/A in the virtual table. That's why you need to have separate VPTRs for each class.
Class D is well aware of existence of class B and the structure of its virtual table and therefore can extend its structure and reuse the VPTR from object B.
When you cast a pointer to object D to a pointer to object B or C or A, it will actually update the pointer by some offset, so that it starts from vptr corresponding to that specific base class.
I could not see any reason why there
is requirement of separate memory in
each class for vptr
At runtime, when you invoke a (virtual) method via a pointer, the CPU has no knowledge about the actual object on which the method is dispatched. If you have B* b = ...; b->some_method(); then the variable b can potentially point at an object created via new B() or via new D() or
even new E() where E is some other class that inherits from (either) B or D. Each of these classes can supply its own implementation (override) for some_method(). Thus, the call b->some_method() should dispatch the implementation from either B, D or E depending on the object on which b is pointing.
The vptr of an object allows the CPU to find the address of the implementation of some_method that is in effect for that object. Each class defines it own vtbl (containing addresses of all virtual methods) and each object of the class starts with a vptr that points at that vtbl.
I think D needs 2 or 3 vptrs.
Here A may or may not require a vptr.
B needs one that should not be shared with A (because A is virtually inherited).
C needs one that should not be shared with A (ditto).
D can use B or C's vftable for its new virtual functions (if any), so it can share B's or C's.
My old paper "C++: Under the Hood" explains the Microsoft C++ implementation of virtual base classes. http://www.openrce.org/articles/files/jangrayhood.pdf
And (MS C++) you can compile with cl /d1reportAllClassLayout to get a text report of class memory layouts.
Happy hacking!