"Crossing the hierarchy" -- why not? - c++

We have a multiple inheritance hierarchy:
// A B
// \ /
// C
//
Both A and B are abstract classes. C is actually a templated class, so downcasting is near impossible if you have a B and you want to access a member of it as an A.
All A's and B's must be C's, since that is the only concrete class in the hierarchy. Therefore, all A's must be B's, and vice versa.
I'm trying to debug something quickly where I have a B that I need to access A::name of. I can't downcast to C because I don't know the templated type of it. So I'm writing code like below and surprisingly it doesn't work; and I'm wondering what gives.
struct A { virtual void go() = 0; std::string name; };
struct B { virtual void go() = 0; };
struct C : A, B { void go() override { } };
int main()
{
C c;
c.name = "Pointer wonders";
puts(c.name.c_str()); // Fine.
B* b = (B*)&c;
//puts(b->name.c_str()); // X no from compiler.
A* a1 = (A*)&c;
puts(a1->name.c_str()); // As expected this is absolutely fine
// "Cross" the hierarchy, because the B really must be a __C__, because of the purely virtual functions.
// neither A nor B can be instantiated, so every instance of A or B must really be a C.
A* a2 = (A*)b;
puts(a2->name.c_str()); // Why not??
// If you downcast first, it works
C* c2 = (C*)b;
A* a3 = (A*)c2;
puts(a3->name.c_str()); // fine
}

First of all, stop using C style cast. The compiler won't complain if you do something wrong (C style cast usually do not works in multiple inheritance).
Any cast that cause run-time error in you example would not compile with a static_cast. While it is a bit longer to type, you get instant feedback when used improperly instead of undefined behavior that will sometime corrupt data and cause problem long afterward when that data is use.
As A and B contains virtual function, you can easily use dynamic_cast without knowing C. If you know C, you could use static_cast to C if you know there is a derived C for sure. But why not use virtual functions and not do any crossing between siblings?
The reason it does not works is because C-style cast can do any of the following cast:
static_cast
reinterperet_cast
const_cast
Also, C style cast will do a reinterpret_cast if the definition of a class is missing. You also need to be very careful with void *as you must convert back to original type.
As a simplified rule, you can imagine that C cast is like doing either a single static_cast (known child or parent class or primitive types like int) or reinterpret_cast (unknown type, not a parent/child class) followed by a const_cast if necessary.
C * --> void * --> B * won't work with any C or C++ cast.
Th primary reason that such cast don't works is that the compiler must adjust this pointer when doing a cast and multiple inheritance is used. This is required to take into account that the A and B part start at a distinct offset.
Alternatively, you can add a virtual function A * GetA() = 0 in B and implemente it in C to have your own way to navigate. That can be an option if is unknown and RTTI must be disabled (for ex. on embedded systems).
Honestly, you should avoid multiple inheritance and casting as it make the code harder to maintain as it increase coupling between classes and it can cause hard to find bug particularily when mixing both together.

Related

Why is it allowed to static_cast a method of a derived class to a method of the base class?

example
struct B1{int x; void f(){x = 1;}};
struct D : B1{int x; void f(){B1::x = 2;}};
using Dmp = void(D::*)();
using B1mp = void(B1::*)();
int main()
{
Dmp dmp = &D::f;
D d;
(d.*dmp)(); // ok
B1mp b1mp = static_cast<B1mp>(dmp); // hm, well that's weird
B1 b1;
(b1.*b1mp)();
dmp = &B1::f; // ok
}
And this example will compile and run just fine, and no problem will arise. But wait, now I'm going to use D::x in D::f, and now -- anything can happen at runtime.
Yes, you can also static_cast a pointer to the base to a pointer to a derived.
static_cast<D*>( (B1*)0 )
But here you can use RTTI to check the types, or just use dynamic_cast if possible.
Yes, static_cast allows a number of things that might be used in "unsafe" ways, like converting void* to another object pointer type, converting Base* to Derived*, and this one.
Although static_cast can be thought of as "relatively safe" compared to reinterpret_cast and const_cast, it's still a cast. And like all casts, it represents a request to ignore some of the type system's safety requirements, with the programmer then responsible for using it carefully and correctly.
In
void f(B *b) {
static_cast<D*>(b)->d_method();
b->static_cast<void (B::*)()>(&D::d_method)();
}
you assume that b is a D to exactly the same degree in each case. Being able to cast the pointer-to-member allows a caller to nominate any function from any derived class when passed to a function that expects a pointer-to-member for the base.

C++ casting oddness

Let's use this simple class hierarchy:
class A
{
public:
virtual void Af() {};
};
class IB
{
public:
virtual void Bf() = 0;
};
class C : public A, public IB
{
public:
virtual void Bf() {}
void Cf() { printf("Cf"); }
};
An now some tests I have done, trying to understand static_cast and dynamic_cast:
1) C* c = new C();
2) A* a = static_cast<A*>(c);
3) IB* ib = static_cast<IB*>(c); //ib gets a different pointer than c because ib vtable is assigned
4) A* correctA = static_cast<A*>(static_cast<C*>(ib)); //Correct, but I must cast first to C and the to A from Interface
5) A* incorrectA = static_cast<A*>(ib); //Compiler error
6) A* correctA2 = dynamic_cast<A*>(ib); //Correct result
Now, some questions:
1) I have started to code in C++ since I moved to C# about 5 years ago. I'm surprised of the "ib" variable value in number 3. I expected it to be same pointer as "c" variable but instead the cast is assigning the value of the vtable of class "ib" in "c"
2) Why must I cast fist to C* and then to A* in 3 to get a correct value? This makes polimorphism useless in this case. Because I want to cast from the interface to the base type without knowing the real type. 5 shows that this is not possible with static_cast (I guess that this is checking the inheritance tree and concluding IB interface is not related to A* but they really are at runtime.
3) 6 gets a correct value into correctA2. I guess it does this correclty as I explain in question 2 because this can be resolved only at runtime.
Could you explain a bit this kind of behaviours and confirm my guessings? It is hard to come back from c# to c++ :D.
Cheers.
It looks like you may be trying to write C# in C++ in which case I suggest just sticking with C#. However I'll try to answer your questions:
1) (note this is implementation details that are probably right on most systems) In a multiply inherited derived class typically an implementation will have multiple virtual tables as the first items in the object memory. In this case a C would have first an A vtable and then an IB vtable. If you try to use the derived pointer as IB without changing its address, the IB would be using the A class's vtable resulting in havoc. Thus, the compiler fixes up the address for you.
2) This is just the way the language tells us static_cast will work: converting between parent/child objects, and a few other relationships like different integral types. dynamic_cast is needed to traverse sibling relationships directly.
3) Correct, since dynamic_cast offers more flexibility for polymorphic conversions you can use it to convert between a sibling relationship.
I should make a closing remark that using multiple inheritance in C++ to provide an implementation to an interface is not a common pattern. There may be alternate approaches if you ask your real question.
A static_cast requires there to be a single compile time relationship between the types that is more direct than any other relationship.
Imagine you had also defined
class D : public IB, public A
The relationship between A and IB through D would be no more nor less direct than through C. A static_cast can use the fact that the most direct relationship between IB and C is IB as a base class of C and can use the fact that the most direct relationship between C and A is A as a base class of C. But the relationship between IB and A through C cannot be know to be the most direct compile time relationship, so static_cast can't use it (by dynamic_cast can use it as the only available run time relationship).

Will polymorphism hold for C++ object references passed around in C?

I have a C++ lib that makes use of a object hierarchy like this:
class A { ... }
class B : public A { ... }
class C : public A { ... }
I expose functionality through a C API via typedefs and functions, like this:
#ifdef __cplusplus
typedef A* APtr;
#else
typedef struct A* APtr;
#endif
extern "C" void some_function(APtr obj);
However, say a use of the C API does something like this:
BPtr b = bptr_create();
some_function((APtr) b);
This is polymorphically valid, since B extends A, and my API depends on such functionality being possible, but I want to make sure that this will still interoperate properly with the C++ code, even if B overrides some of A's virtual methods.
More importantly, why or why not? How can C++ identify at runtime that the obj parameter of some_function is actually a pointer to B, and therefore call its overridden virtual methods instead?
The C code is not valid (nor would the equivalent C++ code in a context where the class definition is not visible) because what C does in this case is the equivalent of a reinterpret_cast. Note that in a simple situation like yours it will likely "work" because most compilers will put the single base object at the beginning of the derived object, so a pointer adjustment is not necessary. However, in the general case (especially when using multiple inheritance), the pointer will have to be adjusted to point to the correct subobject, and since C does not know how to do that, the cast is wrong.
So what is meant with "pointer adjustment"? Consider the following situation:
class A { virtual ~A(); int i; ... };
class B { virtual ~B(); int j; ... };
class C: public A, public B { ... };
Now the layout of C may be as follows:
+----------------------------+----------------------------+
| A subobject (containing i) | B subobject (containing j) |
+----------------------------+----------------------------+
where the virtual pointers of both the A and B subobjects point to C.
Now imagine you've got a C* which you want to convert to a B*. Of course the code which receives the B* may not know about the existence of C; indeed, it may have been compiled before C was even written. Therefore the B* must point to the B subobject of the C object. In other words, on conversion from C* to B*, the size of the A subobject has to be added to the address stored into the pointer. If you do not do this, the B* will actually point to the A subobject, which clearly is wrong.
Now without access to the class definition of C, there's of course no way to know that there even is an A subobject, not to mention how large it is. Therefore it is impossible to do a correct conversion from C* to B* if the class definition of C is not available.
C++ uses the virtual function table which is in memory per class ,
and when an object is created of that particular derived class its
virtual table decides which function gets called.
So its bit c++ compile time Plus Runtime magic :)
http://en.wikipedia.org/wiki/Virtual_method_table
Short answer: Yes this will work.
Why: since A and some_function is implemented in C++, all virtual function calls will occur in C++ code as usual, where the class definition is included, and there is nothing magic about it. In C code only opaque pointers are passed around, and C code never will be able to call the virtual functions directly, because it never could compile the definition of A.

C++ Multiple Inheritance Memory Addressing issue

Please ignore the #include parts assuming they are done correctly. Also this could be implementation specific (but so is the concept of vtables) but i am just curious as it enhances me to visualize multiple inheritance. (I'm using MinGW 4.4.0 by the way)
initial code:
class A {
public:
A() : a(0) {}
int a;
};
//Edit: adding this definition instead
void f(void* ptrA) {
std::cout<<((A*)ptrA)->a;
}
//end of editing of original posted code
#if 0
//this was originally posted. Edited and replaced by the above f() definition
void f(A* ptrA) {
std::cout<<ptrA->a;
}
#endif
this is compiled and Object code is generated.
in some other compilation unit i use (after inclusion of header file for above code):
class C : public B , public A {
public:
int c;
}objC;
f(&objC); // ################## Label 1
memory model for objC:
//<1> stuff from B
//<2> stuff from B
//<3> stuff from A : int a
//<4> stuff from C : int c
&objC will contain starting address of <1> in memory model assumed above
how/when will the compiler shift it to <3>? Does it happen during the inspection of call at Label 1 ?
EDIT::
since Lable 1 seems to be a give away, just making it a little more obscure for the compiler. Pls see the Edited code above. Now when does the compiler do and where?
Yes, you are quite correct.
To fully understand the situation, you have to know what the compiler knows at two points:
At Label 1 (as you have already identified)
Inside function f()
(1) The compiler knows the exact binary layout of both C and A and how to convert from C* to A* and will do so at the call site (Label 1)
(2) Inside function f(), however, the compiler only (needs to) know(s) about A* and so restricts itself to members of A (int a in this case) and cannot be confused about whether the particular instance is part of anything else or not.
Short answer: Compiler will adjust pointer values during cast operations if it knows the relationship between the base and derived class.
Let's say the address of your object instance of class C was at address 100. And let's say sizeof(C) == 4. As does sizeof(B) and sizeof(A).
When a cast happens such as the following:
C c;
A* pA = &c; // implicit cast, preferred for upcasting
A* pA = (A*)&c; // explicit cast old style
A* pA = static_cast<A*>(&c); // static-cast, even better
The pointer value of pA will be the memory address of c plus the offset from where "A" begins in C. In this case, pA will reference memory address 104 assuming sizeof(B) is also 4.
All of this holds true for passing a derived class pointer into a function expecting a base class pointer. The implicit cast will occur as does the pointer offset adjustment.
Likewise, for downcasting:
C* pC = (C*)(&a);
The compiler will take care of adjusting the pointer value during the assigment.
The one "gotcha" to all of this is when a class is forward declared without a full declaration:
// foo.h
class A; // same as above, base class for C
class C; // same as above, derived class from A and B
inline void foo(C* pC)
{
A* pA = (A*)pC; // oops, compiler doesn't know that C derives from A. It won't adjust the pointer value during assigment
SomeOtherFunction(pA); // bug! Function expecting A* parameter is getting garbage
}
That's a real bug!
My general rule. Avoid the old "C-style" cast and favor using the static_cast operator or just rely on implicit casting without an operator to do the right thing (for upcasts). The compiler will issue an error if the casting isn't valid.

C++ virtual function not called in subclass

Consider this simple situation:
A.h
class A {
public:
virtual void a() = 0;
};
B.h
#include <iostream>
class B {
public:
virtual void b() {std::cout << "b()." << std::endl;};
};
C.h
#include "A.h"
#include "B.h"
class C : public B, public A {
public:
void a() {std::cout << "a() in C." << std::endl;};
};
int main() {
B* b = new C();
((A*) b)->a(); // Output: b().
A* a = new C();
a->a(); // Output:: a() in C.
return 0;
}
In other words:
- A is a pure virtual class.
- B is a class with no super class and one non-pure virtual function.
- C is a subclass of A and B and overrides A's pure virtual function.
What surprises me is the first output i.e.
((A*) b)->a(); // Output: b().
Although I call a() in the code, b() is invoked. My guess is that it is related to the fact that the variable b is a pointer to class B which is not a subclass of class A. But still the runtime type is a pointer to a C instance.
What is the exact C++ rule to explain this, from a Java point of view, weird behaviour?
You are unconditionally casting b to an A* using a C-style cast. The Compiler doesn't stop you from doing this; you said it's an A* so it's an A*. So it treats the memory it points to like an instance of A. Since a() is the first method listed in A's vtable and b() is the first method listed in B's vtable, when you call a() on an object that is really a B, you get b().
You're getting lucky that the object layout is similar. This is not guaranteed to be the case.
First, you shouldn't use C-style casts. You should use C++ casting operators which have more safety (though you can still shoot yourself in the foot, so read the docs carefully).
Second, you shouldn't rely on this sort of behavior, unless you use dynamic_cast<>.
Don't use a C-style cast when casting across a multiple inheritance tree. If you use dynamic_cast instead you get the expected result:
B* b = new C();
dynamic_cast<A*>(b)->a();
You are starting with a B* and casting it to A*. Since the two are unrelated, you're delving into the sphere of undefined behavior.
((A*) b) is an explicit c-style cast, which is allowed no matter what the types pointed to are. However, if you try to dereference this pointer, it will be either a runtime error or unpredictable behavior. This is an instance of the latter. The output you observed is by no means safe or guaranteed.
A and B are no related to each other by means of inheritance, which means that a pointer to B cannot be transformed into a pointer to A by means of either upcast or downcast.
Since A and B are two different bases of C, what you are trying to do here is called a cross-cast. The only cast in C++ language that can perform a cross-cast is dynamic_cast. This is what you have to use in this case in case you really need it (do you?)
B* b = new C();
A* a = dynamic_cast<A*>(b);
assert(a != NULL);
a->a();
The following line is a reinterpret_cast, which points at the same memory but "pretends" it is a different kind of object:
((A*) b)->a();
What you really want is a dynamic_cast, which checks what kind of object b really is and adjust what location in memory to point to:
dynamic_cast<A*>(b)->a()
As jeffamaphone mentioned, the similar layout of the two classes is what causes the wrong function to be called.
There is almost never an occasion in C++ where using a C-style cast (or its C++ equivalent reinterpret_cast<>) is justified or required. Whenever you find yourself tempted to use one of the two, suspect your code and/or your design.
I think you have a subtle bug in casting from B* to A*, and the behaviour is undefined. Avoid using C-style casts and prefer the C++ casts - in this case dynamic_cast. Due to the way your compiler has laid out the storage for the data types and vtable entries, you've ended up finding the address of a different function.