Consider this simple situation:
A.h
class A {
public:
virtual void a() = 0;
};
B.h
#include <iostream>
class B {
public:
virtual void b() {std::cout << "b()." << std::endl;};
};
C.h
#include "A.h"
#include "B.h"
class C : public B, public A {
public:
void a() {std::cout << "a() in C." << std::endl;};
};
int main() {
B* b = new C();
((A*) b)->a(); // Output: b().
A* a = new C();
a->a(); // Output:: a() in C.
return 0;
}
In other words:
- A is a pure virtual class.
- B is a class with no super class and one non-pure virtual function.
- C is a subclass of A and B and overrides A's pure virtual function.
What surprises me is the first output i.e.
((A*) b)->a(); // Output: b().
Although I call a() in the code, b() is invoked. My guess is that it is related to the fact that the variable b is a pointer to class B which is not a subclass of class A. But still the runtime type is a pointer to a C instance.
What is the exact C++ rule to explain this, from a Java point of view, weird behaviour?
You are unconditionally casting b to an A* using a C-style cast. The Compiler doesn't stop you from doing this; you said it's an A* so it's an A*. So it treats the memory it points to like an instance of A. Since a() is the first method listed in A's vtable and b() is the first method listed in B's vtable, when you call a() on an object that is really a B, you get b().
You're getting lucky that the object layout is similar. This is not guaranteed to be the case.
First, you shouldn't use C-style casts. You should use C++ casting operators which have more safety (though you can still shoot yourself in the foot, so read the docs carefully).
Second, you shouldn't rely on this sort of behavior, unless you use dynamic_cast<>.
Don't use a C-style cast when casting across a multiple inheritance tree. If you use dynamic_cast instead you get the expected result:
B* b = new C();
dynamic_cast<A*>(b)->a();
You are starting with a B* and casting it to A*. Since the two are unrelated, you're delving into the sphere of undefined behavior.
((A*) b) is an explicit c-style cast, which is allowed no matter what the types pointed to are. However, if you try to dereference this pointer, it will be either a runtime error or unpredictable behavior. This is an instance of the latter. The output you observed is by no means safe or guaranteed.
A and B are no related to each other by means of inheritance, which means that a pointer to B cannot be transformed into a pointer to A by means of either upcast or downcast.
Since A and B are two different bases of C, what you are trying to do here is called a cross-cast. The only cast in C++ language that can perform a cross-cast is dynamic_cast. This is what you have to use in this case in case you really need it (do you?)
B* b = new C();
A* a = dynamic_cast<A*>(b);
assert(a != NULL);
a->a();
The following line is a reinterpret_cast, which points at the same memory but "pretends" it is a different kind of object:
((A*) b)->a();
What you really want is a dynamic_cast, which checks what kind of object b really is and adjust what location in memory to point to:
dynamic_cast<A*>(b)->a()
As jeffamaphone mentioned, the similar layout of the two classes is what causes the wrong function to be called.
There is almost never an occasion in C++ where using a C-style cast (or its C++ equivalent reinterpret_cast<>) is justified or required. Whenever you find yourself tempted to use one of the two, suspect your code and/or your design.
I think you have a subtle bug in casting from B* to A*, and the behaviour is undefined. Avoid using C-style casts and prefer the C++ casts - in this case dynamic_cast. Due to the way your compiler has laid out the storage for the data types and vtable entries, you've ended up finding the address of a different function.
Related
We have a multiple inheritance hierarchy:
// A B
// \ /
// C
//
Both A and B are abstract classes. C is actually a templated class, so downcasting is near impossible if you have a B and you want to access a member of it as an A.
All A's and B's must be C's, since that is the only concrete class in the hierarchy. Therefore, all A's must be B's, and vice versa.
I'm trying to debug something quickly where I have a B that I need to access A::name of. I can't downcast to C because I don't know the templated type of it. So I'm writing code like below and surprisingly it doesn't work; and I'm wondering what gives.
struct A { virtual void go() = 0; std::string name; };
struct B { virtual void go() = 0; };
struct C : A, B { void go() override { } };
int main()
{
C c;
c.name = "Pointer wonders";
puts(c.name.c_str()); // Fine.
B* b = (B*)&c;
//puts(b->name.c_str()); // X no from compiler.
A* a1 = (A*)&c;
puts(a1->name.c_str()); // As expected this is absolutely fine
// "Cross" the hierarchy, because the B really must be a __C__, because of the purely virtual functions.
// neither A nor B can be instantiated, so every instance of A or B must really be a C.
A* a2 = (A*)b;
puts(a2->name.c_str()); // Why not??
// If you downcast first, it works
C* c2 = (C*)b;
A* a3 = (A*)c2;
puts(a3->name.c_str()); // fine
}
First of all, stop using C style cast. The compiler won't complain if you do something wrong (C style cast usually do not works in multiple inheritance).
Any cast that cause run-time error in you example would not compile with a static_cast. While it is a bit longer to type, you get instant feedback when used improperly instead of undefined behavior that will sometime corrupt data and cause problem long afterward when that data is use.
As A and B contains virtual function, you can easily use dynamic_cast without knowing C. If you know C, you could use static_cast to C if you know there is a derived C for sure. But why not use virtual functions and not do any crossing between siblings?
The reason it does not works is because C-style cast can do any of the following cast:
static_cast
reinterperet_cast
const_cast
Also, C style cast will do a reinterpret_cast if the definition of a class is missing. You also need to be very careful with void *as you must convert back to original type.
As a simplified rule, you can imagine that C cast is like doing either a single static_cast (known child or parent class or primitive types like int) or reinterpret_cast (unknown type, not a parent/child class) followed by a const_cast if necessary.
C * --> void * --> B * won't work with any C or C++ cast.
Th primary reason that such cast don't works is that the compiler must adjust this pointer when doing a cast and multiple inheritance is used. This is required to take into account that the A and B part start at a distinct offset.
Alternatively, you can add a virtual function A * GetA() = 0 in B and implemente it in C to have your own way to navigate. That can be an option if is unknown and RTTI must be disabled (for ex. on embedded systems).
Honestly, you should avoid multiple inheritance and casting as it make the code harder to maintain as it increase coupling between classes and it can cause hard to find bug particularily when mixing both together.
I have a C++ lib that makes use of a object hierarchy like this:
class A { ... }
class B : public A { ... }
class C : public A { ... }
I expose functionality through a C API via typedefs and functions, like this:
#ifdef __cplusplus
typedef A* APtr;
#else
typedef struct A* APtr;
#endif
extern "C" void some_function(APtr obj);
However, say a use of the C API does something like this:
BPtr b = bptr_create();
some_function((APtr) b);
This is polymorphically valid, since B extends A, and my API depends on such functionality being possible, but I want to make sure that this will still interoperate properly with the C++ code, even if B overrides some of A's virtual methods.
More importantly, why or why not? How can C++ identify at runtime that the obj parameter of some_function is actually a pointer to B, and therefore call its overridden virtual methods instead?
The C code is not valid (nor would the equivalent C++ code in a context where the class definition is not visible) because what C does in this case is the equivalent of a reinterpret_cast. Note that in a simple situation like yours it will likely "work" because most compilers will put the single base object at the beginning of the derived object, so a pointer adjustment is not necessary. However, in the general case (especially when using multiple inheritance), the pointer will have to be adjusted to point to the correct subobject, and since C does not know how to do that, the cast is wrong.
So what is meant with "pointer adjustment"? Consider the following situation:
class A { virtual ~A(); int i; ... };
class B { virtual ~B(); int j; ... };
class C: public A, public B { ... };
Now the layout of C may be as follows:
+----------------------------+----------------------------+
| A subobject (containing i) | B subobject (containing j) |
+----------------------------+----------------------------+
where the virtual pointers of both the A and B subobjects point to C.
Now imagine you've got a C* which you want to convert to a B*. Of course the code which receives the B* may not know about the existence of C; indeed, it may have been compiled before C was even written. Therefore the B* must point to the B subobject of the C object. In other words, on conversion from C* to B*, the size of the A subobject has to be added to the address stored into the pointer. If you do not do this, the B* will actually point to the A subobject, which clearly is wrong.
Now without access to the class definition of C, there's of course no way to know that there even is an A subobject, not to mention how large it is. Therefore it is impossible to do a correct conversion from C* to B* if the class definition of C is not available.
C++ uses the virtual function table which is in memory per class ,
and when an object is created of that particular derived class its
virtual table decides which function gets called.
So its bit c++ compile time Plus Runtime magic :)
http://en.wikipedia.org/wiki/Virtual_method_table
Short answer: Yes this will work.
Why: since A and some_function is implemented in C++, all virtual function calls will occur in C++ code as usual, where the class definition is included, and there is nothing magic about it. In C code only opaque pointers are passed around, and C code never will be able to call the virtual functions directly, because it never could compile the definition of A.
Ok, I was reading through this entry in the FQA dealing about the issue of converting a Derived** to a Base** and why it is forbidden, and I got that the problem is that you could assign to a Base* something which is not a Derived*, so we forbid that.
So far, so good.
But, if we apply that principle in depth, why aren't we forbidding such example?
void nasty_function(Base *b)
{
*b = Base(3); // Ouch!
}
int main(int argc, char **argv)
{
Derived *d = new Derived;
nasty_function(d); // Ooops, now *d points to a Base. What would happen now?
}
I agree that nasty_function does something idiotic, so we could say that letting that kind of conversion is fine because we enable interesting designs, but we could say that also for the double-indirection: you got a Base **, but you shouldn't assign anything to its deference because you really don't know where that Base ** comes, just like the Base *.
So, the question: what's special about that extra-level-of-indirection? Maybe the point is that, with just one level of indirection, we could play with virtual operator= to avoid that, while the same machinery isn't available on plain pointers?
nasty_function(d); // Ooops, now *d points to a Base. What would happen now?
No, it doesn't. It points to a Derived. The function simply changed the Base subobject in the existing Derived object. Consider:
#include <cassert>
struct Base {
Base(int x) : x(x) {}
int x;
};
struct Derived : Base {
Derived(int x, int y) : Base(x), y(y) {}
int y;
};
int main(int argc, char **argv)
{
Derived d(1,2); // seriously, WTF is it with people and new?
// You don't need new to use pointers
// Stop it already
assert(d.x == 1);
assert(d.y == 2);
nasty_function(&d);
assert(d.x == 3);
assert(d.y == 2);
}
d doesn't magically become a Base, does it? It's still a Derived, but the Base part of it changed.
In pictures :)
This is what Base and Derived objects look like:
When we have two levels of indirection it doesn't work because the things being assigned are pointers:
Notice how neither of the Base or Derived objects in question are attempted to be changed: only the middle pointer is.
But, when you only have one level of indirection, the code modifies the object itself, in a way that the object allows (it can forbid it by making private, hiding, or deleting the assignment operator from a Base):
Notice how no pointers are changed here. This is just like any other operation that changes part of an object, like d.y = 42;.
No, nasty_function() isn't as nasty as it sounds. As the pointer b points to something that is-a Base, it's perfectly legal to assign a Base-value to it.
Take care: your "Ooops" comment is not correct: d still points to the same Derived as before the call! Only, the Base part of it was reassigned (by value!). If that gets your whole Derived out of consistency, you need to redesign by making Base::operator=() virtual. Then, in the nasty_function(), in fact the Derived assignment operator will be called (if defined).
So, I think, your example does not have that much to do with the pointer-to-pointer case.
*b = Base(3) calls Base::operator=(const Base&), which is actually present in Derived as member functions (inc. operators) are inherited.
What would happen then (calling Derived::operator=(const Base&)) is sometimes called "slicing", and yes, it's bad (usually). It's a sad consequence of the sheer omnipresence of the "become-like" operator (the =) in C++.
(Note that the "become-like" operator doesn't exist in most OO languages like Java, C# or Python; = in object contexts there means reference assignment, which is like pointer assignment in C++;).
Summing up:
Casts Derived** -> Base** are forbidden, because they can cause a type error, because then you could end up with a pointer of type Derived* pointing to an object of type Base.
The problem you mentioned isn't a type error; it's a different type of error: mis-use of the interface of the derived object, rooting from the sorry fact that it has inherited the "become-like" operator of its parent class.
(Yes, I call op= in objects contexts "become-like" deliberately, as I feel that "assignment" isn't a good name to show what's happening here.)
Well the code you gave makes sense. Indeed the assignement operator cannot overrite data specific to Derived but only base. Virtual functions are still from Derived and not from Base.
*b = Base(3); // Ouch!
Here the object at *b really is a B, it's the base sub-object of *d. Only that base sub-object gets modified, the rest of the derived object isn't changed and d still points to the same object of the derived type.
You might not want to allow the base to be modified, but in terms if the type system it's correct. A Derived is a Base.
That's not true for the illegal pointer case. A Derived* is convertible to Base* but is not the same type. It violates the type system.
Allowing the conversion you're asking about would be no different to this:
Derived* d;
Base b;
d = &b;
d->x;
Reading through the good answers of my question I think I got the point of the issue, which comes from first principles in OO and has nothing to do with sub-objects and operator overloading.
The point is that you can use a Derived whenever a Base is required (substitution principle), but you cannot use a Derived* whenever a Base* is needed due to the possibility of assigning pointers of instances of derived classes to it.
Take a function with this prototype:
void f(Base **b)
f can do a bunch of things with b, dereferencing it among other things:
void f(Base **b)
{
Base *pb = *b;
...
}
If we passed to f a Derived**, it means that we are using a Derived* as a Base*, which is incorrect since we may assign a OtherDerived* to Base* but not to Derived*.
On the other way, take this function:
void f(Base *b)
If f dereferences b, then we would use a Derived in place of a Base, which is entirerly fine (provided that you give a correct implementation of your class hierarchies):
void f(Base *b)
{
Base pb = *b; // *b is a Derived? No problem!
}
To say it in another way: the substitution principles (use a derived class instead of the base one) works on instances, not on pointers, because the "concept" of a pointer to A is "points to an instance of whichever class inherits A", and the set of classes which inherits Base strictly contains the set of classes that inherits Derived.
Please ignore the #include parts assuming they are done correctly. Also this could be implementation specific (but so is the concept of vtables) but i am just curious as it enhances me to visualize multiple inheritance. (I'm using MinGW 4.4.0 by the way)
initial code:
class A {
public:
A() : a(0) {}
int a;
};
//Edit: adding this definition instead
void f(void* ptrA) {
std::cout<<((A*)ptrA)->a;
}
//end of editing of original posted code
#if 0
//this was originally posted. Edited and replaced by the above f() definition
void f(A* ptrA) {
std::cout<<ptrA->a;
}
#endif
this is compiled and Object code is generated.
in some other compilation unit i use (after inclusion of header file for above code):
class C : public B , public A {
public:
int c;
}objC;
f(&objC); // ################## Label 1
memory model for objC:
//<1> stuff from B
//<2> stuff from B
//<3> stuff from A : int a
//<4> stuff from C : int c
&objC will contain starting address of <1> in memory model assumed above
how/when will the compiler shift it to <3>? Does it happen during the inspection of call at Label 1 ?
EDIT::
since Lable 1 seems to be a give away, just making it a little more obscure for the compiler. Pls see the Edited code above. Now when does the compiler do and where?
Yes, you are quite correct.
To fully understand the situation, you have to know what the compiler knows at two points:
At Label 1 (as you have already identified)
Inside function f()
(1) The compiler knows the exact binary layout of both C and A and how to convert from C* to A* and will do so at the call site (Label 1)
(2) Inside function f(), however, the compiler only (needs to) know(s) about A* and so restricts itself to members of A (int a in this case) and cannot be confused about whether the particular instance is part of anything else or not.
Short answer: Compiler will adjust pointer values during cast operations if it knows the relationship between the base and derived class.
Let's say the address of your object instance of class C was at address 100. And let's say sizeof(C) == 4. As does sizeof(B) and sizeof(A).
When a cast happens such as the following:
C c;
A* pA = &c; // implicit cast, preferred for upcasting
A* pA = (A*)&c; // explicit cast old style
A* pA = static_cast<A*>(&c); // static-cast, even better
The pointer value of pA will be the memory address of c plus the offset from where "A" begins in C. In this case, pA will reference memory address 104 assuming sizeof(B) is also 4.
All of this holds true for passing a derived class pointer into a function expecting a base class pointer. The implicit cast will occur as does the pointer offset adjustment.
Likewise, for downcasting:
C* pC = (C*)(&a);
The compiler will take care of adjusting the pointer value during the assigment.
The one "gotcha" to all of this is when a class is forward declared without a full declaration:
// foo.h
class A; // same as above, base class for C
class C; // same as above, derived class from A and B
inline void foo(C* pC)
{
A* pA = (A*)pC; // oops, compiler doesn't know that C derives from A. It won't adjust the pointer value during assigment
SomeOtherFunction(pA); // bug! Function expecting A* parameter is getting garbage
}
That's a real bug!
My general rule. Avoid the old "C-style" cast and favor using the static_cast operator or just rely on implicit casting without an operator to do the right thing (for upcasts). The compiler will issue an error if the casting isn't valid.
I have following classes.
class A
{
public:
void fun();
}
class B: public A
{
}
class C: public A
{
}
A * ptr = new C;
Is it ok to do something like below? Will i have some problems if introduce some virtual functions in the baseclass?
((B *)ptr)->fun();
This may look stupid, but i have a function that calls A's function through B and i don't want to change that.
You can't cast an A* pointing to Class C as a B* because Class C doesn't have any relation with Class B. You'll get undefined behavior which will probably be the wrong function called and stack corruption.
If you intended for class C to derive from class B then you could. However, you wouldn't need to. If class C doesn't have fun() defined, it will inherit A's. You didn't declare fun() virtual though so you'll get strange behavior if you even implement C::fun() or B::fun(). You almost certainly want fun() to be declared virtual.
I'm guessing here but I suspect the behavior of this might depend on the compiler you use and how it decides to organize the vf pointer table.
I'm also going to note that I think what you are doing is a bad idea and could lead to all kinds of nightmarish problems (use of things like static_cast and dynamic_cast are generally a good idea). The other thing is because fun() is defined in the base class (and it is not virtual) ptr->fun() will always call A::fun() without having to cast it to B*.
You don't have to do the casting (B*) ptr->fun(); since the fun() is already in the base class. both objects of class B or C will invoke the same fun() function in your example.
I'm not sure what happens when u override the fun() function in class B...
But trying to invoke function from another class (not the base class) is bad OO, in my opinion.
You can cast from A * to B *, and it should work if the original pointer was B *.
A* p = new B;
B* q = static_cast<B*>(p); // Just work (tm)
But in this case it is a C *, and it is not guaranteed to work, you will end with a dangling pointer, if you are lucky you will get an access violation, if not you man end up silently corrupting your memory.
A* p = new C;
B* q = static_cast<B*>(p); // Owned (tm)