Please ignore the #include parts assuming they are done correctly. Also this could be implementation specific (but so is the concept of vtables) but i am just curious as it enhances me to visualize multiple inheritance. (I'm using MinGW 4.4.0 by the way)
initial code:
class A {
public:
A() : a(0) {}
int a;
};
//Edit: adding this definition instead
void f(void* ptrA) {
std::cout<<((A*)ptrA)->a;
}
//end of editing of original posted code
#if 0
//this was originally posted. Edited and replaced by the above f() definition
void f(A* ptrA) {
std::cout<<ptrA->a;
}
#endif
this is compiled and Object code is generated.
in some other compilation unit i use (after inclusion of header file for above code):
class C : public B , public A {
public:
int c;
}objC;
f(&objC); // ################## Label 1
memory model for objC:
//<1> stuff from B
//<2> stuff from B
//<3> stuff from A : int a
//<4> stuff from C : int c
&objC will contain starting address of <1> in memory model assumed above
how/when will the compiler shift it to <3>? Does it happen during the inspection of call at Label 1 ?
EDIT::
since Lable 1 seems to be a give away, just making it a little more obscure for the compiler. Pls see the Edited code above. Now when does the compiler do and where?
Yes, you are quite correct.
To fully understand the situation, you have to know what the compiler knows at two points:
At Label 1 (as you have already identified)
Inside function f()
(1) The compiler knows the exact binary layout of both C and A and how to convert from C* to A* and will do so at the call site (Label 1)
(2) Inside function f(), however, the compiler only (needs to) know(s) about A* and so restricts itself to members of A (int a in this case) and cannot be confused about whether the particular instance is part of anything else or not.
Short answer: Compiler will adjust pointer values during cast operations if it knows the relationship between the base and derived class.
Let's say the address of your object instance of class C was at address 100. And let's say sizeof(C) == 4. As does sizeof(B) and sizeof(A).
When a cast happens such as the following:
C c;
A* pA = &c; // implicit cast, preferred for upcasting
A* pA = (A*)&c; // explicit cast old style
A* pA = static_cast<A*>(&c); // static-cast, even better
The pointer value of pA will be the memory address of c plus the offset from where "A" begins in C. In this case, pA will reference memory address 104 assuming sizeof(B) is also 4.
All of this holds true for passing a derived class pointer into a function expecting a base class pointer. The implicit cast will occur as does the pointer offset adjustment.
Likewise, for downcasting:
C* pC = (C*)(&a);
The compiler will take care of adjusting the pointer value during the assigment.
The one "gotcha" to all of this is when a class is forward declared without a full declaration:
// foo.h
class A; // same as above, base class for C
class C; // same as above, derived class from A and B
inline void foo(C* pC)
{
A* pA = (A*)pC; // oops, compiler doesn't know that C derives from A. It won't adjust the pointer value during assigment
SomeOtherFunction(pA); // bug! Function expecting A* parameter is getting garbage
}
That's a real bug!
My general rule. Avoid the old "C-style" cast and favor using the static_cast operator or just rely on implicit casting without an operator to do the right thing (for upcasts). The compiler will issue an error if the casting isn't valid.
Related
We have a multiple inheritance hierarchy:
// A B
// \ /
// C
//
Both A and B are abstract classes. C is actually a templated class, so downcasting is near impossible if you have a B and you want to access a member of it as an A.
All A's and B's must be C's, since that is the only concrete class in the hierarchy. Therefore, all A's must be B's, and vice versa.
I'm trying to debug something quickly where I have a B that I need to access A::name of. I can't downcast to C because I don't know the templated type of it. So I'm writing code like below and surprisingly it doesn't work; and I'm wondering what gives.
struct A { virtual void go() = 0; std::string name; };
struct B { virtual void go() = 0; };
struct C : A, B { void go() override { } };
int main()
{
C c;
c.name = "Pointer wonders";
puts(c.name.c_str()); // Fine.
B* b = (B*)&c;
//puts(b->name.c_str()); // X no from compiler.
A* a1 = (A*)&c;
puts(a1->name.c_str()); // As expected this is absolutely fine
// "Cross" the hierarchy, because the B really must be a __C__, because of the purely virtual functions.
// neither A nor B can be instantiated, so every instance of A or B must really be a C.
A* a2 = (A*)b;
puts(a2->name.c_str()); // Why not??
// If you downcast first, it works
C* c2 = (C*)b;
A* a3 = (A*)c2;
puts(a3->name.c_str()); // fine
}
First of all, stop using C style cast. The compiler won't complain if you do something wrong (C style cast usually do not works in multiple inheritance).
Any cast that cause run-time error in you example would not compile with a static_cast. While it is a bit longer to type, you get instant feedback when used improperly instead of undefined behavior that will sometime corrupt data and cause problem long afterward when that data is use.
As A and B contains virtual function, you can easily use dynamic_cast without knowing C. If you know C, you could use static_cast to C if you know there is a derived C for sure. But why not use virtual functions and not do any crossing between siblings?
The reason it does not works is because C-style cast can do any of the following cast:
static_cast
reinterperet_cast
const_cast
Also, C style cast will do a reinterpret_cast if the definition of a class is missing. You also need to be very careful with void *as you must convert back to original type.
As a simplified rule, you can imagine that C cast is like doing either a single static_cast (known child or parent class or primitive types like int) or reinterpret_cast (unknown type, not a parent/child class) followed by a const_cast if necessary.
C * --> void * --> B * won't work with any C or C++ cast.
Th primary reason that such cast don't works is that the compiler must adjust this pointer when doing a cast and multiple inheritance is used. This is required to take into account that the A and B part start at a distinct offset.
Alternatively, you can add a virtual function A * GetA() = 0 in B and implemente it in C to have your own way to navigate. That can be an option if is unknown and RTTI must be disabled (for ex. on embedded systems).
Honestly, you should avoid multiple inheritance and casting as it make the code harder to maintain as it increase coupling between classes and it can cause hard to find bug particularily when mixing both together.
If we compose calls to dynamic_cast and reach back to the original type on a pointer value, does C++ guarantee to preserve the value, assuming no dynamic_cast fails, i.e., does not return a nullptr?
I looked at the "Memory Layout in Virtual Inheritance" section in https://www.cprogramming.com/tutorial/virtual_inheritance.html. This gave me a decent idea of how it can be implemented. From the looks of it, it seems impossible to get a different address. However, it is only one such implementation, I just want to be sure that no conforming implementation can return a different address than the original.
#include <cassert>
struct A { virtual ~A() {} };
struct B : virtual public A {};
struct C : virtual public A {};
struct D : public B, public C {};
int main()
{
B* b = new D();
C* c = dynamic_cast<C*>(b);
if (c != nullptr)
{
B* bp = dynamic_cast<B*>(c);
assert(bp == b); // Will this assert ever fire?
}
return 0;
}
A more general case:
A* a = ...
dynamic_cast<A*>(dynamic_cast<B*>(...(a)...)) is `a` or nullptr
I ran the above code with gcc8 and clang9 and they don't fire the assert.
Does side-casting to and from sibling types preserve addresses in C++?
Dynamic casting produces a pointer to the base / derived object, and that object has exactly one address. This applies to side casting as much as down casting.
Note that side cast may fail if the base is ambiguous due to having multiple non-virtual bases of the same type. Your example does not have such ambiguity. In such case, null will be returned.
Can any operation change the layout of the object during run-time?
No. Layout of all objects is set in stone at their creation. In standard C++, the layout of all types is set in stone already on compilation, although there are language extensions such as flexible array member which can relax this.
Here is some code that illustrates the question:
#include <iostream>
class Base {
};
class Derived : public Base {
};
void doThings(Base* bases[], int length)
{
for (int i = 0; i < length; ++i)
std::cout << "Do ALL the things\n";
}
int main(int argc, const char * argv[])
{
Derived* arrayOfDerived[2] = { new Derived(), new Derived() };
doThings(arrayOfDerived, 2); // Candidate function not viable: no known conversion from 'Derived *[2]' to 'Base **' for 1st argument
// Attempts to work out the correct cast
Derived** deriveds = &arrayOfDerived[0];
Base** bases = dynamic_cast<Base**>(deriveds); // 'Base *' is not a class
Base** bases = dynamic_cast<Base**>(arrayOfDerived); // 'Base *' is not a class
// Just pretend that it should work
doThings(reinterpret_cast<Base**>(arrayOfDerived), 2);
return 0;
}
Clang produces the errors given in the comments. The question is: "Is there a correct way to cast arrayOfDerived to something that doThings can take?
Bonus marks:
Why does clang produce the errors "'Base *' is not a class" on the given lines? I know that Base* isn't a class it's a pointer. What is the error trying to tell me? Why has dynamic_cast been designed so that in dynamic_cast<T*> the thing T must be a class?
What are the dangers of using the reinterpret_cast to force everything to work?
Thanks as always :)
No. What you’re asking for here is a covariant array type, which is not a feature of C++.
The risk with reinterpret_cast, or a C-style cast, is that while this will work for simple types, it will fail miserably if you use multiple or virtual inheritance, and may also break in the presence of virtual functions (depending on the implementation). Why? Because in those cases a static_cast or dynamic_cast may actually change the pointer value. That is, given
class A {
int a;
};
class B {
string s;
};
class C : public A, B {
double f[4];
};
C *ptr = new C();
unless the classes are empty, ptr == (A *)ptr and/or ptr == (B *)ptr will be false. It only takes a moment to work out why that must be the case; if you're using single inheritance and there is no vtable, it’s guaranteed that the layout of a subclass is the same as the layout of the superclass, followed by member variables defined in the subclass. If, however, you have multiple inheritance, the compiler must choose to lay the class out in some order or other; I'm not sure what the C++ standard has to say about it (you could check - it might define or constrain the layout order somehow), but whatever order it picks, it should be apparent that one of the casts must result in a change the pointer value.
The first error in your code is that you are using a syntax that should be avoided:
void doThings(Base* bases[], int length)
if you look at the error message, it tells you that bases is actually a Base**, and that is also what you should write if you really want to pass such a pointer.
The second problem has to do with type safety. Imagine this code, where I reduced the array of pointers to a single pointer:
class Base {...};
class Derived1: public Base {...};
class Derived2: public Base {...};
Derived1* p1 = 0;
Base*& pb = p1; // reference to a pointer
pb = new Derived2(); // implicit conversion Derived -> Base
If this code compiled, p1 would now suddenly point to a Derived2! The important difference to a simple conversion of a pointer-to-derived to a pointer-to-base is that we have a reference here, and the pointer in your case is no different.
You can use two variations that work:
Base* pb = p1; // pointer value conversion
Base* const& pb = p1; // reference-to-const pointer
Applied to your code, this involves copying the array of pointers-to-derived to an array of pointers-to-base, which is probably what you want to avoid. Unfortunately, the C++ type system doesn't provide any different means to achieve what you want directly. I'm not exactly sure about the reason, but I think that at least according to the standard pointers can have different sizes.
There are two things I would consider:
Convert doThings() to a function template taking two iterators. This follows the spirit of the STL and allows calls with different pointer types or others that provide an iterator interface.
Just write a loop. If the body of the loop is large, you could also consider extracting just that as a function.
In answer to your main question, the answer is not really,
because there's no way of reasonably implementing it. There's
no guarantee that the actual addresses in the Base* will be
the same as those in the Derived*, and in fact, there are many
cases where they aren't. In the case of multiple inheritance,
it's impossible that both bases have the same address as the
derived, because they must have different addresses from each
other. If you have an array of Derived*, whether it be
std::vector<Derived*> or a Derived** pointing to the first
element of an array, the only way to get an array of Base* is
by copying: using std::vector<Derived*>, this would look
something like:
std::vector<Base*> vectBase( vectDerived.size() );
std::transform(
vectDerived.cbegin(),
vectDerived.cend(),
vectBase.begin(),
[]( Derived* ptr ) { return static_cast<Base*>( ptr ); } );
(You can do exactly the same thing with Derived** and
Base**, but you'll have to tweek it with the known lengths.)
As for your "bonus" questions:
Clang (an I suspect every other compiler in existance),
produces the error Base* is not a class because it isn't
a class. You're trying to dynamic_cast between Base**
and Derived**, the pointed to types are Base* and
Derived*, and dynamic_cast requires the pointed to types
to be classes.
The only danger with reinterpret_cast is that it won't
work. You'll get undefined behavior, which will possibly
work in a few simple cases, but won't work generally. You'll
end up with something that the compiler thinks is a Base*,
but which doesn't physically point to a Base*.
I have a C++ lib that makes use of a object hierarchy like this:
class A { ... }
class B : public A { ... }
class C : public A { ... }
I expose functionality through a C API via typedefs and functions, like this:
#ifdef __cplusplus
typedef A* APtr;
#else
typedef struct A* APtr;
#endif
extern "C" void some_function(APtr obj);
However, say a use of the C API does something like this:
BPtr b = bptr_create();
some_function((APtr) b);
This is polymorphically valid, since B extends A, and my API depends on such functionality being possible, but I want to make sure that this will still interoperate properly with the C++ code, even if B overrides some of A's virtual methods.
More importantly, why or why not? How can C++ identify at runtime that the obj parameter of some_function is actually a pointer to B, and therefore call its overridden virtual methods instead?
The C code is not valid (nor would the equivalent C++ code in a context where the class definition is not visible) because what C does in this case is the equivalent of a reinterpret_cast. Note that in a simple situation like yours it will likely "work" because most compilers will put the single base object at the beginning of the derived object, so a pointer adjustment is not necessary. However, in the general case (especially when using multiple inheritance), the pointer will have to be adjusted to point to the correct subobject, and since C does not know how to do that, the cast is wrong.
So what is meant with "pointer adjustment"? Consider the following situation:
class A { virtual ~A(); int i; ... };
class B { virtual ~B(); int j; ... };
class C: public A, public B { ... };
Now the layout of C may be as follows:
+----------------------------+----------------------------+
| A subobject (containing i) | B subobject (containing j) |
+----------------------------+----------------------------+
where the virtual pointers of both the A and B subobjects point to C.
Now imagine you've got a C* which you want to convert to a B*. Of course the code which receives the B* may not know about the existence of C; indeed, it may have been compiled before C was even written. Therefore the B* must point to the B subobject of the C object. In other words, on conversion from C* to B*, the size of the A subobject has to be added to the address stored into the pointer. If you do not do this, the B* will actually point to the A subobject, which clearly is wrong.
Now without access to the class definition of C, there's of course no way to know that there even is an A subobject, not to mention how large it is. Therefore it is impossible to do a correct conversion from C* to B* if the class definition of C is not available.
C++ uses the virtual function table which is in memory per class ,
and when an object is created of that particular derived class its
virtual table decides which function gets called.
So its bit c++ compile time Plus Runtime magic :)
http://en.wikipedia.org/wiki/Virtual_method_table
Short answer: Yes this will work.
Why: since A and some_function is implemented in C++, all virtual function calls will occur in C++ code as usual, where the class definition is included, and there is nothing magic about it. In C code only opaque pointers are passed around, and C code never will be able to call the virtual functions directly, because it never could compile the definition of A.
Consider this simple situation:
A.h
class A {
public:
virtual void a() = 0;
};
B.h
#include <iostream>
class B {
public:
virtual void b() {std::cout << "b()." << std::endl;};
};
C.h
#include "A.h"
#include "B.h"
class C : public B, public A {
public:
void a() {std::cout << "a() in C." << std::endl;};
};
int main() {
B* b = new C();
((A*) b)->a(); // Output: b().
A* a = new C();
a->a(); // Output:: a() in C.
return 0;
}
In other words:
- A is a pure virtual class.
- B is a class with no super class and one non-pure virtual function.
- C is a subclass of A and B and overrides A's pure virtual function.
What surprises me is the first output i.e.
((A*) b)->a(); // Output: b().
Although I call a() in the code, b() is invoked. My guess is that it is related to the fact that the variable b is a pointer to class B which is not a subclass of class A. But still the runtime type is a pointer to a C instance.
What is the exact C++ rule to explain this, from a Java point of view, weird behaviour?
You are unconditionally casting b to an A* using a C-style cast. The Compiler doesn't stop you from doing this; you said it's an A* so it's an A*. So it treats the memory it points to like an instance of A. Since a() is the first method listed in A's vtable and b() is the first method listed in B's vtable, when you call a() on an object that is really a B, you get b().
You're getting lucky that the object layout is similar. This is not guaranteed to be the case.
First, you shouldn't use C-style casts. You should use C++ casting operators which have more safety (though you can still shoot yourself in the foot, so read the docs carefully).
Second, you shouldn't rely on this sort of behavior, unless you use dynamic_cast<>.
Don't use a C-style cast when casting across a multiple inheritance tree. If you use dynamic_cast instead you get the expected result:
B* b = new C();
dynamic_cast<A*>(b)->a();
You are starting with a B* and casting it to A*. Since the two are unrelated, you're delving into the sphere of undefined behavior.
((A*) b) is an explicit c-style cast, which is allowed no matter what the types pointed to are. However, if you try to dereference this pointer, it will be either a runtime error or unpredictable behavior. This is an instance of the latter. The output you observed is by no means safe or guaranteed.
A and B are no related to each other by means of inheritance, which means that a pointer to B cannot be transformed into a pointer to A by means of either upcast or downcast.
Since A and B are two different bases of C, what you are trying to do here is called a cross-cast. The only cast in C++ language that can perform a cross-cast is dynamic_cast. This is what you have to use in this case in case you really need it (do you?)
B* b = new C();
A* a = dynamic_cast<A*>(b);
assert(a != NULL);
a->a();
The following line is a reinterpret_cast, which points at the same memory but "pretends" it is a different kind of object:
((A*) b)->a();
What you really want is a dynamic_cast, which checks what kind of object b really is and adjust what location in memory to point to:
dynamic_cast<A*>(b)->a()
As jeffamaphone mentioned, the similar layout of the two classes is what causes the wrong function to be called.
There is almost never an occasion in C++ where using a C-style cast (or its C++ equivalent reinterpret_cast<>) is justified or required. Whenever you find yourself tempted to use one of the two, suspect your code and/or your design.
I think you have a subtle bug in casting from B* to A*, and the behaviour is undefined. Avoid using C-style casts and prefer the C++ casts - in this case dynamic_cast. Due to the way your compiler has laid out the storage for the data types and vtable entries, you've ended up finding the address of a different function.