Order of calling virtual destructors in C++ - c++

Well so i have been trying to understand OOP concepts through C++ , however i am not able to get some parts of virtual destructors.
I have written a small snippet :
class A{
int x;
public:
virtual void show(){
cout << " In A\n";
}
virtual ~A(){
cout << "~A\n";
};
};
class B: public A{
int y;
public:
virtual void show(){
cout << " In B\n";
}
virtual ~B(){
cout << "~B\n";
};
};
class C: public A{
int z;
public:
virtual void show(){
cout << " In C\n";
}
virtual ~C(){
cout << "~C\n";
};
};
class E: public A{
int z;
public:
virtual void show(){
cout << " In E\n";
}
virtual ~E(){
cout << "~E\n";
};
};
class D: public B , public C , public E{
int z1;
public:
virtual void show(){
cout << " In D\n";
}
virtual ~D(){
cout << "~D\n";
};
};
signed main(){
// A * a = new A();
// B *b = new B();
D *d = new D();
B *b = d;
C *c = d;
E * e = d;
A * a = new A();
cout << d << "\n";
cout << b << "\n";
cout << c << "\n";
cout << e << "\n";
delete b;
// a -> show();
}
On running the code , i get the result as :
0x7f8c5e500000
0x7f8c5e500000
0x7f8c5e500018
0x7f8c5e500030
~D
~E
~A
~C
~A
~B
~A
Now three questions :
According to the wikipedia article , virtual_table , it was referred that object c gets an address +8 bytes than that of d and b , what happens in case of e.
When i call delete b instead of delete d , also get the same order sequence of virtual destructors , so why is the derived class destructor called
The virtual destructors are called only when i delete an object , then how are the vtable and vpointers gets deleted when the program ends ( when i run the code without the delete d the execution just stops without printing anything ).

Your questions in order:
(1) Yes, pointers to bases refering to objects of derived classes with multiple inheritance may change their numerical value compared to a pointer to the most derived type. The reason is that the base class is a part of the derived class, much like a member, residing at an offset. Only for the first derived class in multi-inheritance this offset can be 0. This is the reason why such pointers cannot be cast with a simple reinterpret_cast().
(2) b points to an E which also is-an A.
Exactly that is what being virtual means for a member function: The code generated by the compiler inspects the object pointed to at run time and calls the function defined for the actual type of the object (which is an E), as opposed to the type of the expression used to access that object (which is B). The type of the expression is fully determined at compile time; the type of the actual complete object is not.
If you do not declare a destructor virtual the program may behave as you perhaps expected: The compiler will create code which simply calls the function defined for the type of the expression (for B), without any run-time look-ups. Non-virtual member function calls are slightly more efficient; but in the case of destructors as in your case the behavior is undefined when destroying through a base class expression. If your destructor is public it should be virtual because this scenario could happen.
Herb Sutter has written an article about virtual functions including virtual destructors that's worth reading.
(3) The memory, including dynamically allocated memory, is released and made available again for other uses by modern standard operating systems when the program has exited. (This may not be the case in old operating systems or freestanding implementations, if they offer dynamic allocation.) Destructors of dynamically allocated objects, however, will not be called, which may be a problem if they hold resources like database or network connections which should better be released.

Regarding the addresses of the objects. As already explained in another answer this is compiler dependent. However it can still be explained.
Address of objects in Multiple Inheritance
(a possible compiler implementation)
Here is a possible memory diagram, assuming that the pointer to the virtual table is 8 bytes and int is 4 bytes.
Class D first has its pointer to virtual table (vtbl_ptr or vptr) then comes class B without its own vtbl_ptr, as it can share the same vtbl as D.
Classes C and E must come with their own embedded vtbl_ptr. It will point to the vtbl of D (almost..., there is a thunk issue to handle but let's ignore it, you can read about thunk in the links below but this doesn't affect the need for additional vtbl_ptr).
The additional vptr for each additional base class is required so when we look at C or E, the position of the vptr is always at the same location, i.e. at the top of the object, regardless if it is actually a concrete C or it is a D that is held as C. And the same for E and any other base class that is not the first inherited base.
The addresses that we may see according to the above:
D d; // sitting at some address X
B* b = &d; // same address
C* c = &d; // jumps over vtbl_ptr (8 bytes) + B without vtbl_ptr (8 bytes)
// thus X + 16 -- or X + 10 in hexa
E* e = &d; // jumps in addition over C part including vtbl_ptr (16 bytes)
// thus X + 32 -- or X + 20 in hexa
Note that the math for the addresses that appear in the question might be a bit different, as said things are compiler dependent. Size of int may be different, padding might be different and the way to arrange the vtbl and vptr is also compiler dependent.
To read more about object layout and address calulations, see:
C++: Under the Hood by Jan Gray (old but still relevant)
And the following SO entries on the subject:
Object layout in case of virtual functions and multiple inheritance
Understanding virtual table in multiple inheritance

According to the wikipedia article , virtual_table , it was referred that object c gets an address +8 bytes than that of d and b , what happens in case of e.
Addresses are often compiler-dependent, and hence pretty dicey. I wouldn't rely on them being any particular value.
When i call delete b instead of delete d , also get the same order sequence of virtual destructors , so why is the derived class destructor called
The type of the pointer doesn't matter. The underlying object was created with new D() so those are the destructors that get called. This is because it might be difficult to delete objects properly otherwise -- if you have a factory that creates various subclasses, how would you know which type to delete it as?
(What's actually going on here is that (pointers to) the destructors are stored in the object's vtable.)
The virtual destructors are called only when i delete an object , then how are the vtable and vpointers gets deleted when the program ends ( when i run the code without the delete d the execution just stops without printing anything ).
If you never delete something, it never gets cleaned up. The program ends without freeing that memory from the heap. This is a "memory leak". When the program ends, the OS cleans up the whole program's heap in one go (without caring what's in it).

Related

Memory layout of multiple inheritance after upcasting [duplicate]

I've been using multiple inheritance in c++ for quite a long time, but only realised today that this could imply that the pointer addresses could be different when referencing them as one of the subclasses.
For example, if I have:
class ClassA{
public:
int x;
int y;
ClassA(){
cout << "ClassA : " << (unsigned int)this << endl;
}
};
class ClassC{
public:
int cc;
int xx;
ClassC(){
cout << "ClassC : " << (unsigned int)this << endl;
}
};
class ClassB : public ClassC, public ClassA{
public:
int z;
int v;
ClassB(){
cout << "ClassB : " << (unsigned int)this << endl;
}
};
int main(){
ClassB * b = new ClassB();
}
class A and class C have different addresses when printed on the constructor.
Yet, when I try to cast them back to each other, it just works automagically:
ClassA * the_a = (ClassA*)b;
cout << "The A, casted : " << (unsigned int)the_a << endl;
ClassB * the_b = (ClassB*)the_a;
cout << "The B, casted back : " << (unsigned int)the_b << endl;
I suppose this kind of information can be derived by the compiler from the code, but is it safe to assume that this works on all compilers?
Additional Question : is it possible to force the order in which the subclass locations go? For example, if I need classA to be located first (essentially, share the same pointer location) as ClassC which subclasses it, do I just need to put it first in the declaration of subclasses?
Update Okay, looks like it's not possible to force the order. Is it still possible to find out the "root" address of the structure, the start of the address allocated to the subclass, at the superclass level? For example, getting classB's address from ClassA.
That's a perfectly standard use of multiple inheritance, and it will work on any compliant compiler. You should however note that an explicit cast in unnecessary in the first case, and that the 'C style cast' could be replaced by a static_cast in the second.
Is it possible to force the order in which the subclass locations go.
No : the layout is implementation defined, and your code should not depend on this ordering.
Yes, you can ssume that this is safe. A pointer typecast in C++ is guaranteed to correctly adjust the pointer to account for base-to-derived or vice-versa conversions.
That said, you have to be careful not to push the system too far. For example, the compiler won't get this conversion right:
ClassB* b = new ClassB;
ClassC* c = (ClassC*)(void*)b;
This breaks because the cast from C to A get funneled through a void*, and so the information about where the pointer is inside the B object is lost.
Another case where a straight cast won't work is with virtual inheritance. If you want to cast from a derived class to a virtual base or vice-versa, I believe you have to use the dynamic_cast operator to ensure that the cast succeeds.
Yea, a naive implementation for virtual bases is to place at a known location in the derived object a pointer to the virtual base subobject.
If you have multiple virtual bases that's a bit expensive. A better representation (patented by Microsoft I think) uses self-relative offsets. Since these are invariant for each subobject (that is, they don't depend on the object address), they can be stored once in static memory, and all you need is a single pointer to them in each subobject.
Without some such data structure, cross casts would not work. You can cross cast from a virtual base A to a virtual base B inside the virtual base A even though B subobject isn't visible (provided only the two bases have a shared virtual base X if I recall correctly). That's pretty hairy navigation when you think about it: the navigation goes via the shape descriptor of the most derived class (which can see both A and B).
What's even more hairy is that the structure so represented is required "by law of the Standard" to change dynamically during construction and destruction, which is why construction and destruction of complex objects is slow. Once they're made, however, method calls, even cross calls, are quite fast.

pure virtual function with implementation in C++

I'm aware that one can make implementation for pure virtual function in base class, as a default implementation. But i don't quite understand the code below.
class A {
public:
virtual void f1() = 0;
virtual void f2() = 0;
};
void A::f1(){
cout << "base f1" << endl;
f2();
}
void A::f2(){
cout << "base f2" << endl;
}
class B: public A {
public:
void f1() { A::f1(); }
void f2() { cout << "derived f2" << endl; }
};
int main(){
B b;
b.f1();
}
Why does the B::f1() calls B::f2() instead of A::f2. I know it will behave this way, but why? what basic knowledge i have missed.
another question, did implementation for pure virtual function in base class make the pure(=0) unnecessary?
This is the behaviour the C++ standard defines for virtual functions: call the version of the most derived type available.
Sure, for normal objects, the most derived type is the one of the object itself:
B b;
b.f1(); // of course calls B's version
The interesting part is if you have pointers or references:
B b;
A& ar = b;
A* ap = &b;
// now both times, B's version will be called
ar.f1();
ap->f1();
The same occurs inside f1, actually, you do implicitly:
this->f2(); // 'this' is a POINTER of type A* (or A const* in const functions).
There's a phenomenon when this does not occur (the example below requires a copy constructor):
B b;
A a = b; // notice: not a pointer or reference!
A.f1(); // now calls A's version
What actually happens here is that only the A part of b is copied into a and the B part is dropped, so a actually is a true, non-derived A object. This is called 'object slicing' and is the reason for that you cannot use base objects in e. g. a std::vector to store polymorphic objects, but need pointers or references instead.
Back to virtual functions: If you are interested in the technical details, this is solved via virtual function tables, short vtables. Be aware that this is only a de-facto standard, C++ does not require implementation via vtables (and actually, other languages supporting polymorphism/inheritance, such as Java or Python, implement vtables, too).
For each virtual function in a class there's an entry in its corresponding vtable.
Normal functions are called directly (i. e. an unconditional branch to the function's address is executed). For virtual function calls, in contrast, we first need to lookup the address in the vtable and only then we can jump to the address stored there.
Derived classes now copy the vtables of their base classes (so initially these contain the same addresses than the base class tables), but replace the appropriate addresses as soon as you override a function.
By the way: You can tell the compiler not to use the vtable, but explicitly call a specific variant:
B b;
A& a = b;
a.A::f1(); // calls A's version inspite of being virtual,
// because you explicitly told so
b.A::f1(); // alike, works even on derived type

behaviour of sizeof in c++

When I do sizeof in c++, will I be sure to get the "whole object"? I am asking because I am about to copy objects to other areas of memory using memcpy (probably a stupid idea from the start, right?).
What I am worried about is that I may not get the whole object, but only the parts belonging to the class it is casted to right now. Does it make any sense or am I being confused?
EDIT Examples
class A{ public: int a = 123; };
class B: public A{ public: int b = 321; };
class C : public B{ public: int c = 333; };
C c_ = C();
B b_ = C();
A a_ = C();
std::cout << sizeof(a_) << " , " << sizeof(b_) << " , " << sizeof(c_) << std::endl;
Seems to give me 4,8,12.
I guess I would need to do dynamic casting to figure out how to get the "whole" object which I constructed as a "C" class in each case?
sizeof will always return the static size of your object. Notice that in your example it will coincide with the true object size, as there is no polymorphism; when you do
A a = B();
a is of type A - you just happened to initialize a new A object with a new B object, which results in slicing (a gets initialized with the fields of B() that are common with A).
A better example would be:
B b;
A *a = &b;
In this case, *a will indeed be of dynamic type B, but sizeof(*a) will still return sizeof(A), not sizeof(B).
There are several ways to obtain the dynamic size of an object:
save it into a field at construction time;
in theory, you could define a virtual method that does return sizeof(*this); and redefine it in all derived classes.
That being said, this last method won't be particularly useful, as doing memcpy of non-trivial types such as polymorphic classes is undefined behavior (and so even the first method as well, as I imagine that you'll want to do this with polymorphic types).
The common approach to the problem of copying polymorphic classes is to accept the fact that they'll have to live in the heap and define clone() method that does virtual A * clone() {return new B(*this);} (where B is the derived class) in each derived class, and invoke clone() whenever you need a copy.
Mind you, there are subtler tricks you can pull; once I had a class hierarchy which had a virtual method dispatching to the placement new for each derived class and one for the destructor, but you really have to know what you are doing (in my case I was invoking them over a union containing an instance for each derived class, so size and alignment was not a problem).

Segfault with Embedded Structs and virtual functions

I have structs like this:
struct A
{
int a;
virtual void do_stuff(A*a)
{
cout << "I'm just a boring A-struct: " << a << endl;
}
}
struct B
{
A a_part;
char * bstr;
void do_stuff(B*bptr)
{
cout << "I'm actually a B-struct! See? ..." << bptr->bstr << endl;
}
}
B * B_new(int n, char * str)
{
B * b = (B*) malloc(sizeof(struct B));
b->a_part.a = n;
b->bstr = strdup(str);
return b;
}
Now, when I do this:
char * blah = strdup("BLAAARGH");
A * b = (A*) B_new(5, blah);
free(blah);
b->do_stuff(b);
I get a segfault on the very last line when I call do_stuff and I have no idea why.
This is my first time working with virtual functions in structs like this so I'm quite lost. Any help would be greatly appreciated!
Note: the function calls MUST be in the same format as the last line in terms of argument type, which is why I'm not using classes or inheritance.
You're mixing a C idiom (embedded structs) with C++ concepts (virtual functions). In C++, the need for embedded structs is obviated by classes and inheritance. virtual functions only affect classes in the same inheritance hierarchy. In your case, there is no relationship between A and B, so A's doStuff is always going to get called.
Your segfault is probably caused because b is a really a B, but assigned to an A*. When the compiler sees b->doStuff, it tries to go to a vtable to look up which version of doStuff to call. However, B doesn't have a vtable, so your program crashes.
In C++, a class without virtual functions that doesn't inherit from any other classes is laid out exactly like a C struct.
class NormalClass
{
int a;
double b;
public:
NormalClass(int x, double y);
};
looks like this:
+------------------------------------+
| a (4 bytes) | b (8 bytes) |
+------------------------------------+
However, a class (or struct) with virtual functions also has a pointer to a vtable, which enables C++'s version of polymorphism. So a class like this:
class ClassWithVTable
{
int a;
double b;
public:
ClassWithVTable();
virtual void doSomething();
};
is laid out in memory like this:
+-----------------------------------------------------------+
| vptr (sizeof(void *)) | a (4 bytes) | b (8 bytes) |
+-----------------------------------------------------------+
and vptr points to an implementation-defined table called the vtable, which is essentially an array of function pointers.
Casting a B * to an A * and then attempting to dereference it via a member function call is undefined behaviour. One possibility is a seg-fault. I'm not saying that this is definitely the cause, but it's not a good start.
I don't understand why you're not using inheritance here!
For polymorphic objects, the pointer to the vtable is stored inside the object.
So at runtime, the method to be actually called is found via dereferencing and jumping into the vtable.
In your case you cast B * to A *.
Since A is polymorhic, the method call will be determined via the vtable, but since the object being used is actually B the vpointer used, is actually garbage and you get the segfault.

Subclass casting, and pointer address changes

I've been using multiple inheritance in c++ for quite a long time, but only realised today that this could imply that the pointer addresses could be different when referencing them as one of the subclasses.
For example, if I have:
class ClassA{
public:
int x;
int y;
ClassA(){
cout << "ClassA : " << (unsigned int)this << endl;
}
};
class ClassC{
public:
int cc;
int xx;
ClassC(){
cout << "ClassC : " << (unsigned int)this << endl;
}
};
class ClassB : public ClassC, public ClassA{
public:
int z;
int v;
ClassB(){
cout << "ClassB : " << (unsigned int)this << endl;
}
};
int main(){
ClassB * b = new ClassB();
}
class A and class C have different addresses when printed on the constructor.
Yet, when I try to cast them back to each other, it just works automagically:
ClassA * the_a = (ClassA*)b;
cout << "The A, casted : " << (unsigned int)the_a << endl;
ClassB * the_b = (ClassB*)the_a;
cout << "The B, casted back : " << (unsigned int)the_b << endl;
I suppose this kind of information can be derived by the compiler from the code, but is it safe to assume that this works on all compilers?
Additional Question : is it possible to force the order in which the subclass locations go? For example, if I need classA to be located first (essentially, share the same pointer location) as ClassC which subclasses it, do I just need to put it first in the declaration of subclasses?
Update Okay, looks like it's not possible to force the order. Is it still possible to find out the "root" address of the structure, the start of the address allocated to the subclass, at the superclass level? For example, getting classB's address from ClassA.
That's a perfectly standard use of multiple inheritance, and it will work on any compliant compiler. You should however note that an explicit cast in unnecessary in the first case, and that the 'C style cast' could be replaced by a static_cast in the second.
Is it possible to force the order in which the subclass locations go.
No : the layout is implementation defined, and your code should not depend on this ordering.
Yes, you can ssume that this is safe. A pointer typecast in C++ is guaranteed to correctly adjust the pointer to account for base-to-derived or vice-versa conversions.
That said, you have to be careful not to push the system too far. For example, the compiler won't get this conversion right:
ClassB* b = new ClassB;
ClassC* c = (ClassC*)(void*)b;
This breaks because the cast from C to A get funneled through a void*, and so the information about where the pointer is inside the B object is lost.
Another case where a straight cast won't work is with virtual inheritance. If you want to cast from a derived class to a virtual base or vice-versa, I believe you have to use the dynamic_cast operator to ensure that the cast succeeds.
Yea, a naive implementation for virtual bases is to place at a known location in the derived object a pointer to the virtual base subobject.
If you have multiple virtual bases that's a bit expensive. A better representation (patented by Microsoft I think) uses self-relative offsets. Since these are invariant for each subobject (that is, they don't depend on the object address), they can be stored once in static memory, and all you need is a single pointer to them in each subobject.
Without some such data structure, cross casts would not work. You can cross cast from a virtual base A to a virtual base B inside the virtual base A even though B subobject isn't visible (provided only the two bases have a shared virtual base X if I recall correctly). That's pretty hairy navigation when you think about it: the navigation goes via the shape descriptor of the most derived class (which can see both A and B).
What's even more hairy is that the structure so represented is required "by law of the Standard" to change dynamically during construction and destruction, which is why construction and destruction of complex objects is slow. Once they're made, however, method calls, even cross calls, are quite fast.