I'm reading multiple inheritance for c++
An Example in the paper:(page 377)
class A {virtual void f();};
class B {virtual void f(); virtual void g();};
class C: A, B {void f();};
A* pa = new C;
B* pb = new C;
C* pc = new C;
pa->f();
pb->f();
pc->f();
pc->g()
(1) Bjarne wrote: On entry to C::f, the this pointer must point to the beginning of the C object (and not to the B part). However, it is not in general known at compile time that the B pointed to by pb is part of a C so the compiler cannot subtract the constant delta(B). So we have to store the delta(B) for the runtime which is actually stored with the vtbl. So the vtbl entry now looks like:
struct vtbl_entry {
void (*fct)();
int delta;
}
An object of class c will look like this:
---------- vtbl:
vptr -------------->-----------------------
A part C::f | 0
---------- -----------------------
vptr -------------->-----------------------
B part C::f | -delta(B)
---------- B::g | 0
C part -----------------------
----------
Bjarne wrote:
pb->f() // call of C::f:
register vtbl_entry* vt = &pb->vtbl[index(f)];
(*vt->fct)((B*)((char*)pb+vt->delta)) //vt->delta is a negative number I guess
I'm totally confused here. Why (B*) not a (C*) in (*vt->fct)((B*)((char*)pb+vt->delta))???? Based on my understanding and Bjarne's introduction at the first sentence at 5.1 section at 377 page, we should pass a C* as this here!!!!!!
Followed by the above code snippet, Bjarne continued writing:
Note that the object pointer may have to be adjusted to po
int to the correct sub-object before looking for the member pointing to the vtbl.
Oh, Man!!! I totally have no idea of what Bjarne tried to say? Can you help me explain it?
It is a C*, it's just not typed as such.
Frankly, that's a pretty terrible explanation and not really how it's done. It's a lot better and easier to store function pointers in the vtable.
struct vtbl {
void(*f)(B* b);
};
struct B {
vtbl* vtable;
};
// Invoke function:
B* p = init();
p->vtable->f(p);
// Function pointer points to:
void f_thunk(B* b) {
C* c = (char*)b - delta(B);
C::f(c);
}
When the compiler generates the thunks, it knows the derived object they're thunking to, so they don't need to store the offset in the vtable. The compiler can simply offset the pointer inside the thunk and then delegate to the appropriate method with the pointer. Of course, this thunk is pretty much generated assembly only without any C++ representation, so it would be invalid to state that that the pointers within have any particular C++ type.
You're getting lost in the weeds here.
Are you trying to write your own C++ compiler? If so, feel free to ignore me. But if you're just trying to understand and learn C++ virtual inheritance, which is what it sounds like, none of what you wrote matters much.
Only compiler writers need to fully understand and figure out how the vtbl works, in all the gory details, in order to actually implement C++. It is not needed to effectively program and develop in C++. All that needs to be understood is how virtual inheritance works, from purely the class's viewpoint. As long as you understand that when invoking pb's method you're actually ending up invoking C's method, and why (with the why being simply "because it is actually an instance of C), that's pretty much all that needs to be understood.
Oh, and your class should probably have a virtual destructor, but that's a different story.
The vtbl is typically not even accessible by C++ code. And the C++ standard does not even require a C++ compiler to actually implement anything that's called "vtable". The only requirement is a specification for how virtual inheritance, and virtual method calls must work. Any actual implementation that produces the same results is compliant.
I'm totally confused here. Why (B*) not a (C*) in (*vt->fct)
At that level, the only known type is B. The actual object could be of type C, Foo, or Bar.
However, that paper is a bit dated. Actual implementations in modern compiler could be very different. #Puppy's answer shows how it can be done without adding delta(B) to the vtable.
Related
Consider the following code
class B1 {
public:
void f0() {}
virtual void f1() {}
int int_in_b1;
};
class B2 {
public:
virtual void f2() {}
int int_in_b2;
};
class D : public B1, public B2 {
public:
void d() {}
void f2() {int temp=int_in_b1;} // override B2::f2()
int int_in_d;
};
and the following memory layout for the object d:
d:
+0: pointer to virtual method table of D (for B1)
+4: value of int_in_b1
+8: pointer to virtual method table of D (for B2)
+12: value of int_in_b2
+16: value of int_in_d
Total size: 20 Bytes.
virtual method table of D (for B1):
+0: B1::f1() // B1::f1() is not overridden
virtual method table of D (for B2):
+0: D::f2() // B2::f2() is overridden by D::f2()
D *d = new D();
d->f2();
When d->f2(); is invoked, D::f2 needs access to data from B1, but modified this pointer
(*(*(d[+8]/*pointer to virtual method table of D (for B2)*/)[0]))(d+8) /* Call d->f2() */
is passed to D::f2, then how is D::f2 able to access it?
The code is taken(and modified) from :https://en.wikipedia.org/wiki/Virtual_method_table#Multiple_inheritance_and_thunks
Your case is actually too simple: The compiler can know that you have a pointer to a D object, so it can perform the lookup from the right table, passing the unmodified this pointer to the f2() implementation.
The interesting case is, when you have a pointer to B2:
B2* myD = new D();
myD->f2();
Now we start with an adjusted base pointer, and need to find the this pointer for the whole object. One way to achieve that would be to store an offset alongside the function pointer that is used to produce a valid this pointer from the B2 pointer used to access the vtable.
Thus, in your case, the code might implicitly be compiled like this
D* myD = new D();
((B2*)myD)->f2();
adjusting the pointer two times (once deriving the B2* from the D*, then the inverse using the offset from the vtable). Your compiler may be clever enough to avoid this, though.
In any case, this is firmly within the field of implementation. Your compiler can do anything, as long as it behaves the way the standard specifies.
Firstly, the effect you are describing as "modifying a this pointer" is an implementation detail of some particular compiler. There is no specific requirement that a compiler modify pointers like you describe.
There is also no requirement that an object have vtables, let alone that they are laid out like you describe. The actual requirement is that the correct overload of a virtual function will be called at run time, and that it will be able to correctly access data members and call member functions. Now, in practice, compilers tend to use vtables, but that is an implementation detail because alternatives are less efficient by various measures.
Now, that said, the following discussion will assume every class with a virtual function has a vtable. Looking at your example, what does this do?
D *d = new D();
d->f2();
The first thing is that the compiler knows that d is a pointer to D, and knows that D has a function named f2(). It will also know that f2() is a virtual function inherited from B2 (which is one reason that it is not possible to call a class member function unless the compiler has visibility of the complete class definition).
In this case, we know what d and D are, so we know D::f2() should be called, with the this pointer equal in value to d. The compiler has the same information (it knows d is a D *) so it just does that. Now, okay, it might or might not look up D::f2() in the vtable, but that is the end of it.
The more interesting example, like cmaster said, is
B2* myD = new D();
myD->f2();
In this case, myD is a pointer to B2. The compiler knows that B2 has a virtual function named f2(), so knows it has to call the correct overload.
The thing is, in the statement myD->f2(), the compiler might not know that myD actually points to a D (e.g. the construction of the object and the calling of the member function might be in different functions, in different compilation units). However, it does know that a B2 has a virtual function named f2(), which is required to call the actual overloaded version correctly.
This means the compiler needs two bits of information. Firstly it needs information identifying the actual function (D::f2()) to be called. The second bit of information will be some adjustment of myD to make the call of D::f2() work correctly. This second bit of information is essentially what is needed to produce (what you are calling) the "modified this pointer" from myD.
If the compiler does all this with the help of vtables, it might include BOTH bits of information in the vtable for B2. So (assuming the second bit of information is an offset) the compiler turns
myD->f2();
into something like
(myD + myD->vtable->offset_for_f2)->(myD->vtable->entry_for_f2)();
The part (myD + myD->vtable->offset_for_f2) is essentially what you are describing as "the modified this pointer" which D::f2() will see when called. The part (myD->vtable->entry_for_f2) is essentially the address of D::f2() (say the address of the member function).
The next question to ask is how might the compiler populate the vtable? The short answer is that it does this when constructing the object.
B2* myD = new D();
The new expression (new D()) essentially expands to
void *temp = ::operator new(sizeof (D)); // assuming class does not supply its own operator new
// construct a `D` in the memory pointed to by temp
temp = (D *)myD; // the compiler knows we're creating a D, so doesn't use offsets or anything funky here
The process of turning the memory pointed to be temp into a D is the important thing. Firstly, it invokes constructors of base classes (B2 and B2), then constructs or initialises Ds members, then it invokes D's constructor (the C++ standard actually describes the order of events in exquisite detail). The other thing is that the compiler does bookkeeping to ensure we actually get a valid D from the process. Part of that is populating the vtable.
Now, since the compiler has complete visibility of the definition of class D (i.e. complete definition of base classes, its members, etc), it has all the information needed to populate the vtable. In other words, it has all the information it needs to give sensible values to both myD->vtable->offset_for_f2 and myD->vtable->entry_for_f2
In the case of multiple inheritence, assuming one vtable per base class, the compiler has all the information it needs to populate all the vtables in a similar way. In other words, the compiler knows how it lays out objects in memory, including their vtables, and uses that knowledge appropriately.
But, then again, it might not. As I said, vtables is a technique commonly used in compilers to implement/support virtual function dispatch. There are other ways too.
again i can't comment so must answer here.
is no problem in the code!
D::f2 needs access to data from B1
then how is D::f2 able to access it?
just write in D::f2, B1::int_in_b1 then you ca access to int value.
In your example, when d->f2() is called, compiler knows that d is a pointer to class D. To call f2(), it would adjust the pointer of d to be "this" of B2 before passing it to virtual f2(), as you describe. Now, inside of the D::f2(), the compiler knows that this is D::f2() and it knows how D inherits from B2, and so it fixes the "this" of B2 to be "this" of D in the very beginning of the function, so when your code executes it would see that "this" is that of D. Therefore it can access any members of D inside of D::f2().
If you would have had
B2* b = d;
b->f2();
When b->f2() is called, the pointer being passed to f2() is "this" of B2. Inside D::f2(), the passed pointer is fixed to point to this of D.
I have a C++ lib that makes use of a object hierarchy like this:
class A { ... }
class B : public A { ... }
class C : public A { ... }
I expose functionality through a C API via typedefs and functions, like this:
#ifdef __cplusplus
typedef A* APtr;
#else
typedef struct A* APtr;
#endif
extern "C" void some_function(APtr obj);
However, say a use of the C API does something like this:
BPtr b = bptr_create();
some_function((APtr) b);
This is polymorphically valid, since B extends A, and my API depends on such functionality being possible, but I want to make sure that this will still interoperate properly with the C++ code, even if B overrides some of A's virtual methods.
More importantly, why or why not? How can C++ identify at runtime that the obj parameter of some_function is actually a pointer to B, and therefore call its overridden virtual methods instead?
The C code is not valid (nor would the equivalent C++ code in a context where the class definition is not visible) because what C does in this case is the equivalent of a reinterpret_cast. Note that in a simple situation like yours it will likely "work" because most compilers will put the single base object at the beginning of the derived object, so a pointer adjustment is not necessary. However, in the general case (especially when using multiple inheritance), the pointer will have to be adjusted to point to the correct subobject, and since C does not know how to do that, the cast is wrong.
So what is meant with "pointer adjustment"? Consider the following situation:
class A { virtual ~A(); int i; ... };
class B { virtual ~B(); int j; ... };
class C: public A, public B { ... };
Now the layout of C may be as follows:
+----------------------------+----------------------------+
| A subobject (containing i) | B subobject (containing j) |
+----------------------------+----------------------------+
where the virtual pointers of both the A and B subobjects point to C.
Now imagine you've got a C* which you want to convert to a B*. Of course the code which receives the B* may not know about the existence of C; indeed, it may have been compiled before C was even written. Therefore the B* must point to the B subobject of the C object. In other words, on conversion from C* to B*, the size of the A subobject has to be added to the address stored into the pointer. If you do not do this, the B* will actually point to the A subobject, which clearly is wrong.
Now without access to the class definition of C, there's of course no way to know that there even is an A subobject, not to mention how large it is. Therefore it is impossible to do a correct conversion from C* to B* if the class definition of C is not available.
C++ uses the virtual function table which is in memory per class ,
and when an object is created of that particular derived class its
virtual table decides which function gets called.
So its bit c++ compile time Plus Runtime magic :)
http://en.wikipedia.org/wiki/Virtual_method_table
Short answer: Yes this will work.
Why: since A and some_function is implemented in C++, all virtual function calls will occur in C++ code as usual, where the class definition is included, and there is nothing magic about it. In C code only opaque pointers are passed around, and C code never will be able to call the virtual functions directly, because it never could compile the definition of A.
Please ignore the #include parts assuming they are done correctly. Also this could be implementation specific (but so is the concept of vtables) but i am just curious as it enhances me to visualize multiple inheritance. (I'm using MinGW 4.4.0 by the way)
initial code:
class A {
public:
A() : a(0) {}
int a;
};
//Edit: adding this definition instead
void f(void* ptrA) {
std::cout<<((A*)ptrA)->a;
}
//end of editing of original posted code
#if 0
//this was originally posted. Edited and replaced by the above f() definition
void f(A* ptrA) {
std::cout<<ptrA->a;
}
#endif
this is compiled and Object code is generated.
in some other compilation unit i use (after inclusion of header file for above code):
class C : public B , public A {
public:
int c;
}objC;
f(&objC); // ################## Label 1
memory model for objC:
//<1> stuff from B
//<2> stuff from B
//<3> stuff from A : int a
//<4> stuff from C : int c
&objC will contain starting address of <1> in memory model assumed above
how/when will the compiler shift it to <3>? Does it happen during the inspection of call at Label 1 ?
EDIT::
since Lable 1 seems to be a give away, just making it a little more obscure for the compiler. Pls see the Edited code above. Now when does the compiler do and where?
Yes, you are quite correct.
To fully understand the situation, you have to know what the compiler knows at two points:
At Label 1 (as you have already identified)
Inside function f()
(1) The compiler knows the exact binary layout of both C and A and how to convert from C* to A* and will do so at the call site (Label 1)
(2) Inside function f(), however, the compiler only (needs to) know(s) about A* and so restricts itself to members of A (int a in this case) and cannot be confused about whether the particular instance is part of anything else or not.
Short answer: Compiler will adjust pointer values during cast operations if it knows the relationship between the base and derived class.
Let's say the address of your object instance of class C was at address 100. And let's say sizeof(C) == 4. As does sizeof(B) and sizeof(A).
When a cast happens such as the following:
C c;
A* pA = &c; // implicit cast, preferred for upcasting
A* pA = (A*)&c; // explicit cast old style
A* pA = static_cast<A*>(&c); // static-cast, even better
The pointer value of pA will be the memory address of c plus the offset from where "A" begins in C. In this case, pA will reference memory address 104 assuming sizeof(B) is also 4.
All of this holds true for passing a derived class pointer into a function expecting a base class pointer. The implicit cast will occur as does the pointer offset adjustment.
Likewise, for downcasting:
C* pC = (C*)(&a);
The compiler will take care of adjusting the pointer value during the assigment.
The one "gotcha" to all of this is when a class is forward declared without a full declaration:
// foo.h
class A; // same as above, base class for C
class C; // same as above, derived class from A and B
inline void foo(C* pC)
{
A* pA = (A*)pC; // oops, compiler doesn't know that C derives from A. It won't adjust the pointer value during assigment
SomeOtherFunction(pA); // bug! Function expecting A* parameter is getting garbage
}
That's a real bug!
My general rule. Avoid the old "C-style" cast and favor using the static_cast operator or just rely on implicit casting without an operator to do the right thing (for upcasts). The compiler will issue an error if the casting isn't valid.
Consider this simple situation:
A.h
class A {
public:
virtual void a() = 0;
};
B.h
#include <iostream>
class B {
public:
virtual void b() {std::cout << "b()." << std::endl;};
};
C.h
#include "A.h"
#include "B.h"
class C : public B, public A {
public:
void a() {std::cout << "a() in C." << std::endl;};
};
int main() {
B* b = new C();
((A*) b)->a(); // Output: b().
A* a = new C();
a->a(); // Output:: a() in C.
return 0;
}
In other words:
- A is a pure virtual class.
- B is a class with no super class and one non-pure virtual function.
- C is a subclass of A and B and overrides A's pure virtual function.
What surprises me is the first output i.e.
((A*) b)->a(); // Output: b().
Although I call a() in the code, b() is invoked. My guess is that it is related to the fact that the variable b is a pointer to class B which is not a subclass of class A. But still the runtime type is a pointer to a C instance.
What is the exact C++ rule to explain this, from a Java point of view, weird behaviour?
You are unconditionally casting b to an A* using a C-style cast. The Compiler doesn't stop you from doing this; you said it's an A* so it's an A*. So it treats the memory it points to like an instance of A. Since a() is the first method listed in A's vtable and b() is the first method listed in B's vtable, when you call a() on an object that is really a B, you get b().
You're getting lucky that the object layout is similar. This is not guaranteed to be the case.
First, you shouldn't use C-style casts. You should use C++ casting operators which have more safety (though you can still shoot yourself in the foot, so read the docs carefully).
Second, you shouldn't rely on this sort of behavior, unless you use dynamic_cast<>.
Don't use a C-style cast when casting across a multiple inheritance tree. If you use dynamic_cast instead you get the expected result:
B* b = new C();
dynamic_cast<A*>(b)->a();
You are starting with a B* and casting it to A*. Since the two are unrelated, you're delving into the sphere of undefined behavior.
((A*) b) is an explicit c-style cast, which is allowed no matter what the types pointed to are. However, if you try to dereference this pointer, it will be either a runtime error or unpredictable behavior. This is an instance of the latter. The output you observed is by no means safe or guaranteed.
A and B are no related to each other by means of inheritance, which means that a pointer to B cannot be transformed into a pointer to A by means of either upcast or downcast.
Since A and B are two different bases of C, what you are trying to do here is called a cross-cast. The only cast in C++ language that can perform a cross-cast is dynamic_cast. This is what you have to use in this case in case you really need it (do you?)
B* b = new C();
A* a = dynamic_cast<A*>(b);
assert(a != NULL);
a->a();
The following line is a reinterpret_cast, which points at the same memory but "pretends" it is a different kind of object:
((A*) b)->a();
What you really want is a dynamic_cast, which checks what kind of object b really is and adjust what location in memory to point to:
dynamic_cast<A*>(b)->a()
As jeffamaphone mentioned, the similar layout of the two classes is what causes the wrong function to be called.
There is almost never an occasion in C++ where using a C-style cast (or its C++ equivalent reinterpret_cast<>) is justified or required. Whenever you find yourself tempted to use one of the two, suspect your code and/or your design.
I think you have a subtle bug in casting from B* to A*, and the behaviour is undefined. Avoid using C-style casts and prefer the C++ casts - in this case dynamic_cast. Due to the way your compiler has laid out the storage for the data types and vtable entries, you've ended up finding the address of a different function.
Single inheritance is easy to implement. For example, in C, the inheritance can be simulated as:
struct Base { int a; }
struct Descendant { Base parent; int b; }
But with multiple inheritance, the compiler has to arrange multiple parents inside newly constructed class. How is it done?
The problem I see arising is: should the parents be arranged in AB or BA, or maybe even other way? And then, if I do a cast:
SecondBase * base = (SecondBase *) &object_with_base1_and_base2_parents;
The compiler must consider whether to alter or not the original pointer. Similar tricky things are required with virtuals.
The following paper from the creator of C++ describes a possible implementation of multiple inheritance:
Multiple Inheritance for C++ - Bjarne Stroustrup
There was this pretty old MSDN article on how it was implemented in VC++.
And then, if I do a cast:
SecondBase base = (SecondBase *) object_with_base1_and_base2_parents;
The compiler must consider whether to alter or not the original pointer. Similar tricky things with virtuals.
With non-virutal inheritance this is less tricky than you might think - at the point where the cast is compiled, the compiler knows the exact layout of the derived class (after all, the compiler did the layout). Usually all that happens is a fixed offset (which may be zero for one of the base classes) is added/subtracted from the derived class pointer.
With virutal inheritance it is maybe a bit more complex - it may involve grabbing an offset from a vtbl (or similar).
Stan Lippman's book, "Inside the C++ Object Model" has very good descriptions of how this stuff might (and often actually does) work.
Parents are arranged in the order that they're specified:
class Derived : A, B {} // A comes first, then B
class Derived : B, A {} // B comes first, then A
Your second case is handled in a compiler-specific manner. One common method is using pointers that are larger than the platform's pointer size, to store extra data.
This is an interesting issue that really isn't C++ specific. Things get more complex also when you have a language with multiple dispatch as well as multiple inheritance (e.g. CLOS).
People have already noted that there are different ways to approach the problem. You might find reading a bit about Meta-Object Protocols (MOPs) interesting in this context...
Its entirely down to the compiler how it is done, but I beleive its generally done througha heirarchical structure of vtables.
I have performed simple experiment:
class BaseA { int a; };
class BaseB { int b; };
class Descendant : public BaseA, BaseB {};
int main() {
Descendant d;
BaseB * b = (BaseB*) &d;
Descendant *d2 = (Descendant *) b;
printf("Descendant: %p, casted BaseB: %p, casted back Descendant: %p\n", &d, b, d2);
}
Output is:
Descendant: 0xbfc0e3e0, casted BaseB: 0xbfc0e3e4, casted back Descendant: 0xbfc0e3e0
It's good to realise that static casting does not always mean "change the type without touching the content". (Well, when data types do not fit each other, then there will be also an interference into content, but it's different situation IMO).