Segfault with Embedded Structs and virtual functions - c++

I have structs like this:
struct A
{
int a;
virtual void do_stuff(A*a)
{
cout << "I'm just a boring A-struct: " << a << endl;
}
}
struct B
{
A a_part;
char * bstr;
void do_stuff(B*bptr)
{
cout << "I'm actually a B-struct! See? ..." << bptr->bstr << endl;
}
}
B * B_new(int n, char * str)
{
B * b = (B*) malloc(sizeof(struct B));
b->a_part.a = n;
b->bstr = strdup(str);
return b;
}
Now, when I do this:
char * blah = strdup("BLAAARGH");
A * b = (A*) B_new(5, blah);
free(blah);
b->do_stuff(b);
I get a segfault on the very last line when I call do_stuff and I have no idea why.
This is my first time working with virtual functions in structs like this so I'm quite lost. Any help would be greatly appreciated!
Note: the function calls MUST be in the same format as the last line in terms of argument type, which is why I'm not using classes or inheritance.

You're mixing a C idiom (embedded structs) with C++ concepts (virtual functions). In C++, the need for embedded structs is obviated by classes and inheritance. virtual functions only affect classes in the same inheritance hierarchy. In your case, there is no relationship between A and B, so A's doStuff is always going to get called.
Your segfault is probably caused because b is a really a B, but assigned to an A*. When the compiler sees b->doStuff, it tries to go to a vtable to look up which version of doStuff to call. However, B doesn't have a vtable, so your program crashes.
In C++, a class without virtual functions that doesn't inherit from any other classes is laid out exactly like a C struct.
class NormalClass
{
int a;
double b;
public:
NormalClass(int x, double y);
};
looks like this:
+------------------------------------+
| a (4 bytes) | b (8 bytes) |
+------------------------------------+
However, a class (or struct) with virtual functions also has a pointer to a vtable, which enables C++'s version of polymorphism. So a class like this:
class ClassWithVTable
{
int a;
double b;
public:
ClassWithVTable();
virtual void doSomething();
};
is laid out in memory like this:
+-----------------------------------------------------------+
| vptr (sizeof(void *)) | a (4 bytes) | b (8 bytes) |
+-----------------------------------------------------------+
and vptr points to an implementation-defined table called the vtable, which is essentially an array of function pointers.

Casting a B * to an A * and then attempting to dereference it via a member function call is undefined behaviour. One possibility is a seg-fault. I'm not saying that this is definitely the cause, but it's not a good start.
I don't understand why you're not using inheritance here!

For polymorphic objects, the pointer to the vtable is stored inside the object.
So at runtime, the method to be actually called is found via dereferencing and jumping into the vtable.
In your case you cast B * to A *.
Since A is polymorhic, the method call will be determined via the vtable, but since the object being used is actually B the vpointer used, is actually garbage and you get the segfault.

Related

Order of calling virtual destructors in C++

Well so i have been trying to understand OOP concepts through C++ , however i am not able to get some parts of virtual destructors.
I have written a small snippet :
class A{
int x;
public:
virtual void show(){
cout << " In A\n";
}
virtual ~A(){
cout << "~A\n";
};
};
class B: public A{
int y;
public:
virtual void show(){
cout << " In B\n";
}
virtual ~B(){
cout << "~B\n";
};
};
class C: public A{
int z;
public:
virtual void show(){
cout << " In C\n";
}
virtual ~C(){
cout << "~C\n";
};
};
class E: public A{
int z;
public:
virtual void show(){
cout << " In E\n";
}
virtual ~E(){
cout << "~E\n";
};
};
class D: public B , public C , public E{
int z1;
public:
virtual void show(){
cout << " In D\n";
}
virtual ~D(){
cout << "~D\n";
};
};
signed main(){
// A * a = new A();
// B *b = new B();
D *d = new D();
B *b = d;
C *c = d;
E * e = d;
A * a = new A();
cout << d << "\n";
cout << b << "\n";
cout << c << "\n";
cout << e << "\n";
delete b;
// a -> show();
}
On running the code , i get the result as :
0x7f8c5e500000
0x7f8c5e500000
0x7f8c5e500018
0x7f8c5e500030
~D
~E
~A
~C
~A
~B
~A
Now three questions :
According to the wikipedia article , virtual_table , it was referred that object c gets an address +8 bytes than that of d and b , what happens in case of e.
When i call delete b instead of delete d , also get the same order sequence of virtual destructors , so why is the derived class destructor called
The virtual destructors are called only when i delete an object , then how are the vtable and vpointers gets deleted when the program ends ( when i run the code without the delete d the execution just stops without printing anything ).
Your questions in order:
(1) Yes, pointers to bases refering to objects of derived classes with multiple inheritance may change their numerical value compared to a pointer to the most derived type. The reason is that the base class is a part of the derived class, much like a member, residing at an offset. Only for the first derived class in multi-inheritance this offset can be 0. This is the reason why such pointers cannot be cast with a simple reinterpret_cast().
(2) b points to an E which also is-an A.
Exactly that is what being virtual means for a member function: The code generated by the compiler inspects the object pointed to at run time and calls the function defined for the actual type of the object (which is an E), as opposed to the type of the expression used to access that object (which is B). The type of the expression is fully determined at compile time; the type of the actual complete object is not.
If you do not declare a destructor virtual the program may behave as you perhaps expected: The compiler will create code which simply calls the function defined for the type of the expression (for B), without any run-time look-ups. Non-virtual member function calls are slightly more efficient; but in the case of destructors as in your case the behavior is undefined when destroying through a base class expression. If your destructor is public it should be virtual because this scenario could happen.
Herb Sutter has written an article about virtual functions including virtual destructors that's worth reading.
(3) The memory, including dynamically allocated memory, is released and made available again for other uses by modern standard operating systems when the program has exited. (This may not be the case in old operating systems or freestanding implementations, if they offer dynamic allocation.) Destructors of dynamically allocated objects, however, will not be called, which may be a problem if they hold resources like database or network connections which should better be released.
Regarding the addresses of the objects. As already explained in another answer this is compiler dependent. However it can still be explained.
Address of objects in Multiple Inheritance
(a possible compiler implementation)
Here is a possible memory diagram, assuming that the pointer to the virtual table is 8 bytes and int is 4 bytes.
Class D first has its pointer to virtual table (vtbl_ptr or vptr) then comes class B without its own vtbl_ptr, as it can share the same vtbl as D.
Classes C and E must come with their own embedded vtbl_ptr. It will point to the vtbl of D (almost..., there is a thunk issue to handle but let's ignore it, you can read about thunk in the links below but this doesn't affect the need for additional vtbl_ptr).
The additional vptr for each additional base class is required so when we look at C or E, the position of the vptr is always at the same location, i.e. at the top of the object, regardless if it is actually a concrete C or it is a D that is held as C. And the same for E and any other base class that is not the first inherited base.
The addresses that we may see according to the above:
D d; // sitting at some address X
B* b = &d; // same address
C* c = &d; // jumps over vtbl_ptr (8 bytes) + B without vtbl_ptr (8 bytes)
// thus X + 16 -- or X + 10 in hexa
E* e = &d; // jumps in addition over C part including vtbl_ptr (16 bytes)
// thus X + 32 -- or X + 20 in hexa
Note that the math for the addresses that appear in the question might be a bit different, as said things are compiler dependent. Size of int may be different, padding might be different and the way to arrange the vtbl and vptr is also compiler dependent.
To read more about object layout and address calulations, see:
C++: Under the Hood by Jan Gray (old but still relevant)
And the following SO entries on the subject:
Object layout in case of virtual functions and multiple inheritance
Understanding virtual table in multiple inheritance
According to the wikipedia article , virtual_table , it was referred that object c gets an address +8 bytes than that of d and b , what happens in case of e.
Addresses are often compiler-dependent, and hence pretty dicey. I wouldn't rely on them being any particular value.
When i call delete b instead of delete d , also get the same order sequence of virtual destructors , so why is the derived class destructor called
The type of the pointer doesn't matter. The underlying object was created with new D() so those are the destructors that get called. This is because it might be difficult to delete objects properly otherwise -- if you have a factory that creates various subclasses, how would you know which type to delete it as?
(What's actually going on here is that (pointers to) the destructors are stored in the object's vtable.)
The virtual destructors are called only when i delete an object , then how are the vtable and vpointers gets deleted when the program ends ( when i run the code without the delete d the execution just stops without printing anything ).
If you never delete something, it never gets cleaned up. The program ends without freeing that memory from the heap. This is a "memory leak". When the program ends, the OS cleans up the whole program's heap in one go (without caring what's in it).

behaviour of sizeof in c++

When I do sizeof in c++, will I be sure to get the "whole object"? I am asking because I am about to copy objects to other areas of memory using memcpy (probably a stupid idea from the start, right?).
What I am worried about is that I may not get the whole object, but only the parts belonging to the class it is casted to right now. Does it make any sense or am I being confused?
EDIT Examples
class A{ public: int a = 123; };
class B: public A{ public: int b = 321; };
class C : public B{ public: int c = 333; };
C c_ = C();
B b_ = C();
A a_ = C();
std::cout << sizeof(a_) << " , " << sizeof(b_) << " , " << sizeof(c_) << std::endl;
Seems to give me 4,8,12.
I guess I would need to do dynamic casting to figure out how to get the "whole" object which I constructed as a "C" class in each case?
sizeof will always return the static size of your object. Notice that in your example it will coincide with the true object size, as there is no polymorphism; when you do
A a = B();
a is of type A - you just happened to initialize a new A object with a new B object, which results in slicing (a gets initialized with the fields of B() that are common with A).
A better example would be:
B b;
A *a = &b;
In this case, *a will indeed be of dynamic type B, but sizeof(*a) will still return sizeof(A), not sizeof(B).
There are several ways to obtain the dynamic size of an object:
save it into a field at construction time;
in theory, you could define a virtual method that does return sizeof(*this); and redefine it in all derived classes.
That being said, this last method won't be particularly useful, as doing memcpy of non-trivial types such as polymorphic classes is undefined behavior (and so even the first method as well, as I imagine that you'll want to do this with polymorphic types).
The common approach to the problem of copying polymorphic classes is to accept the fact that they'll have to live in the heap and define clone() method that does virtual A * clone() {return new B(*this);} (where B is the derived class) in each derived class, and invoke clone() whenever you need a copy.
Mind you, there are subtler tricks you can pull; once I had a class hierarchy which had a virtual method dispatching to the placement new for each derived class and one for the destructor, but you really have to know what you are doing (in my case I was invoking them over a union containing an instance for each derived class, so size and alignment was not a problem).

How is pointer to member function implemented in C++?

The pointer to member function in c++ is in three parts:
Offset
Address/index
virtual?
Offset is used for Pointer adjustment when derived object is called using base pointer.
How is this offset implemented? Is it pointer to some table, one table for each derived class and the table contains entries of the form (base X, offset)?
Also, where can I get more info about this?
First you should note that a C++ method can be implemented (and is normally implemented) just as a regular function that accepts an extra hidden parameter before all other parameters, named this.
In other words in
struct P2d {
double x, y;
void doIt(int a, double b) {
...
}
};
the machine code for doIt is the same that would be generated by a C compiler for
void P2d$vid$doIt(P2d *this, int a, double b) {
...
}
and a call like p->doIt(10, 3.14) is compiled to P2d$vid$doIt(p, 10, 3.14);
Given this a method pointer for a simple class that has no virtual methods can be implemented as a regular pointer to the method code (NOTE: I'm using vid for "Void of Int+Double" as a toy example of the "name mangling" that C++ compilers do to handle overloads - different functions with the same name but different parameters).
If the class has virtual methods however this is no more true.
Most C++ compilers implement virtual dispatching unsing a VMT... i.e. in
struct P2d {
...
virtual void doIt(int a, double b);
};
the code for a call like p->doIt(10, 3.14) where p is a P2d * is the same that a C compiler would generate for
(p->$VMTab.vid$doIt)(p, 10, 3.14);
i.e. the instance contains a hidden pointer to a virtual method table that for each member contains the effective code address (assuming the compiler cannot infer that the class of p is indeed P2d and not a derived, as in that case the call can be the same as for a non-virtual method).
Method pointers are required to respect virtual methods... i.e. calling doIt indirectly using a method pointer on an instance derived from P2d is required to call the derived version while the same method pointer is instead to call the base version when used on P2d instances. This means the selection of which code to call depends on both the pointer and the class instance.
A possible implementation is using a trampoline:
void MethodPointerCallerForP2dDoit(P2d *p, int a, double b) {
p->doIt(a, b);
}
and in this case a method pointer is still just pointer to code (but to the trampoline, not to the final method).
An alternative would be to store as method pointer the index of the method inside the VMT instead. This would be feasible because in C++ the method pointer is tied to a specific class and therefore the compiler knows if for that class there are virtual methods or not.
Multiple inheritance do not complicate things for method pointers because everything can be just resolved to a single final VMT table at compile time.
This is a comment regarding 6502's answer, but I lack reputation.
a method pointer for a simple class that has no virtual methods can be implemented as a regular pointer to the method
I believe this statement is incorrect because multiple inheritance complicates things. Consider this code:
struct A {
int a;
void f() {
// use a
}
};
struct B {
int b;
void g() {
// use b
}
};
// C objects may look like this in memory:
//
// |-----|
// | A |
// |-----|
// | B |
// |-----|
//
// Since only one of A and B can be at the start of C in terms of memory
// layout, at least one of the following can't work:
//
// * naively interpreting a pointer to C as a pointer to A
// * naively interpreting a pointer to C as a pointer to B
struct C : A, B {};
void call_ptm(void (C::*ptm)()) {
C c;
// `ptm` could be `A::f` or `B::g`. We don't know. At what offset
// relative to `this` do we expect to find data members then? It depends
// on whether `ptm` points to `A::f` or `B::g`, which isn't known at
// compile time.
(c.*ptm)();
}
One way or another, a pointer to a member function needs to store an offset that can be applied to this upon invocation.

typecasting with virtual functions

In the code below, pC == pA:
class A
{
};
class B : public A
{
public:
int i;
};
class C : public B
{
public:
char c;
};
int main()
{
C* pC = new C;
A* pA = (A*)pC;
return 0;
}
But when I add a pure virtual function to B and implement it in C, pA != pC:
class A
{
};
class B : public A
{
public:
int i;
virtual void Func() = 0;
};
class C : public B
{
public:
char c;
void Func() {}
};
int main()
{
C* pC = new C;
A* pA = (A*)pC;
return 0;
}
Why is pA not equal to pC in this case? Don't they both still point to the same "C" object in memory?
You're seeing a different value for your pointer because the new virtual function is causing the injection of a vtable pointer into your object. VC++ is putting the vtable pointer at the beginning of the object (which is typical, but purely an internal detail).
Let's add a new field to A so that it's easier to explain.
class A {
public:
int a;
};
// other classes unchanged
Now, in memory, your pA and A look something like this:
pA --> | a | 0x0000004
Once you add B and C into the mix, you end up with this:
pC --> | vtable | 0x0000000
pA --> | a | 0x0000004
| i | 0x0000008
| c | 0x000000C
As you can see, pA is pointing to the data after the vtable, because it doesn't know anything about the vtable or how to use it, or even that it's there. pC does know about the vtable, so it points directly to the table, which simplifies its use.
A pointer to an object is convertible to a pointer to base object and vice versa, but the conversion doesn't have to be trivial. It's entirely possible, and often necessary, that the base pointer has a different value than the derived pointer. That's why you have a strong type system and conversions. If all pointers were the same, you wouldn't need either.
Here are my assumptions, based on the question.
1) You have a case where you cast from a C to an A and you get the expected behaviour.
2) You added a virtual function, and that cast no longer works (in that you can no longer pull data from A directly after the cast to A, you get data that makes no sense to you).
If these assumptions are true the hardship you are experiencing is the insertion of the virtual table in B. This means the data in the class is no longer perfectly lined up with the data in the base class (as in the class has added bytes, the virtual table, that are hidden from you). A fun test would be to check sizeof to observe the growth of unknown bytes.
To resolve this you should not cast directly from A to C to harvest data. You should add a getter function that is in A and inherited by B and C.
Given your update in the comments, I think you should read this, it explains virtual tables and the memory layout, and how it is compiler dependent. That link explains, in more detail, what I explained above, but gives examples of the pointers being different values. Really, I had WHY you were asking the question wrong, but it seems the information is still what you wanted. The cast from C to A takes into account the virtual table at this point (note C-8 is 4, which on a 32 bit system would be the size of the address needed for the virtual table, I believe).

Multiple inheritance of virtual classes

Suppose I have the following code:
class a {
public:
virtual void do_a() = 0;
}
class b {
public:
virtual void do_b() = 0;
}
class c: public a, public b {
public:
virtual void do_a() {};
virtual void do_b() {};
}
a *foo = new c();
b *bar = new c();
Will foo->do_a() and bar->do_b() work? What's the memory layout here?
Will a->do_a() and b->do_b() work?
Assuming you meant foo->do_a() and bar->do_b(), as a and b are not object, they're type, yes. They will work. Did you try run that?
What's the memory layout here?
That is implementation-defined, mostly. Fortunately, you don't need to know about that unless you want to write non-portable code.
Why shouldn't they? The memory layout will typically be something like:
+----------+
| A part |
+----------+
| B part |
+----------+
| C part |
+----------+
If you convert your foo and bar to void* and display them, you'll
get different addresses, but the compiler knows this, and will arrange
for the this pointer to be correctly fixed up when calling the
function.
As others have mentioned the following will work without any problems
foo->do_a();
bar->do_b();
These, however, will not compile
bar->do_a();
foo->do_b();
Since bar is of type b* it has no knowledge of do_a. The same is true for foo and do_b. If you want to make those function calls you must downcast.
static_cast<c *>(foo)->do_b();
static_cast<c *>(bar)->do_a();
The other very important thing that is not shown in your example code is, when inheriting, and referring to the derived class through base class pointer, the base class MUST have a virtual destructor. If it doesn't then the following will produce undefined behavior.
a* foo = new c();
delete a;
The fix is simple
class a {
public:
virtual void do_a() = 0;
virtual ~a() {}
};
Of course, this change needs to be made to b as well.
Yes, of course it will work. The mechanics are a bit tricky though. The object will have two vtables, one for the class a parent and one for the class b parent. The pointers will be adjusted so that they point to the subset of the object that corresponds to the pointer type, leading to this surprising result:
c * baz = new c;
a * foo = baz;
b * bar = baz;
assert((void *)foo == (void *)bar); // assertion fails!
The compiler knows the types at the time of the assignment, and knows exactly how to adjust the pointers.
This is of course completely compiler dependent; nothing in the C++ standard says it has to work this way. Only that it has to work.
foo->do_a(); // will work
bar->do_b(); // will work
bar->do_a(); // compile error (do_a() is not a member of B)
foo->do_b(); // compile error (do_b() is not a member of A)
// If you really know the types are correct:
C* c = static_cast<C*>(foo);
c->do_a(); // will work
c->do_b(); // will work
// If you don't know the types, you can try at runtime:
if(C* c = dynamic_cast<C*>(foo))
{
c->do_a(); // will work
c->do_b(); // will work
}
Will a->do_a() and b->do_b() work?
No.
Will foo->do_a() and bar->do_b() work?
Yes. Your code is the canonical example of virtual function dispatch.
Why didn't you just try it?
What's the memory layout here?
Who cares?
(i.e. this is implementation-defined, and abstracted from you. You should not need to nor want to know.)
They will work. In terms of memory, this is implementation dependent. You have created objects on the heap, and for most systems, it is worth noting that objects on the heap grow upwards (c.f. the stack grows downwards). So possibly, you will have:
Memory:
+foo+
-----
+bar+