I was reading through some of effective c++ and I realized I may be incorrect in my thinking along the way.
class A
{
public:
void laka()
{
const void * raw = dynamic_cast<const void*>(this);
cout << raw << endl;
}
virtual ~A() = 0;
};
A::~A() {}
class B : public A
{
public:
void ditka() {}
};
int _tmain(int argc, _TCHAR* argv[])
{
B b;
cout << &b << endl;
b.laka();
return 0;
}
The book stated that by using dynamic_cast with *void, I would get the starting address of an object however, all of the addresses output of the same.
When I just output the address of the plain old &b above, is the address displayed the starting address of the derived object or the base object within b?
If I was incorrect or wrong about #1, how would I get the starting addresses of each subobject within b? Do I just manually have to offset and how does dynamic_cast work with this or just clarify what the author meant?
Most implementations of inheritance put the first base class subobject at the beginning of the derived class, so you really need two base classes, both with data members, to be able to see this. Consider:
#include <iostream>
struct B1 {
int x;
virtual ~B1() { }
};
struct B2 {
int y;
virtual ~B2() { }
};
struct D : B1, B2 { };
int main() {
D x;
B1* b1_ptr = &x;
B2* b2_ptr = &x;
std::cout << "original address: " << &x << "\n";
std::cout << "b1_ptr: " << b1_ptr << "\n";
std::cout << "dynamic_cast b1_ptr: " << dynamic_cast<void*>(b1_ptr) << "\n";
std::cout << "b2_ptr: " << b2_ptr << "\n";
std::cout << "dynamic_cast b2_ptr: " << dynamic_cast<void*>(b2_ptr) << "\n";
}
Example output (from my machine; your results will be similar):
original address: 0030FB88
b1_ptr: 0030FB88
dynamic_cast b1_ptr: 0030FB88
b2_ptr: 0030FB90
dynamic_cast b2_ptr: 0030FB88
This tells us that the B1 subobject of D is located at the beginning, so it has the same address as the D object of which it is a subobject.
The B2 subobject is located at a different address, but when you use dynamic_cast<void*> on the pointer to the B2 subobject, it gives you the address of the D object of which it is a subobject.
The book was correct, a dynamic_cast to cv-qualified void* converts the pointer to a pointer to the most derived object pointed to by the pointer that you supply, so you get the starting address of the derived object. Both your output statements should print the same address (assuming there isn't a specific std::ostream and B* overload for operator<<) as b is the most derived object.
There is no reason the a base class subobject can't have the same starting address as a derived object and this is what often happens in many implementations, at least for the first base class subobject in a derived class.
This is all compiler and implementation dependent. In your case a B is a A + something, so it sotres the A then the B specific members. so the address of &b and the one displayed by your dynamic_cast ought to be the same.
When I just output the address of the plain old &b above, is the address
displayed the starting address of the
derived object or the base object
within b?
You could say "yes", it's the starting address of the base object class A within b (which is the same as the starting address of the derived object b itself) ... but the derived object is not really a "separate" object from the base-object. The derived object is also not something that will necessarily start with a fixed offset from the base object, especially if it's a non-POD class (plain-old-data-type) with virtual functions since the first address of both the base and derived objects is a pointer to a v-table specific to either the base or derived object. So you can't really "slice" apart a derived object into a "base-object" and derived object, other than the fact that for most compiler instances, the derived objects non-static data members will come after an offset from the non-static data memebers of the base-object. But again, arbitrary "slicing" will cause issues with the v-table pointer, and also for non-POD classes, any private non-static member objects may be allocated in an "optimized" fashion that may make the memory layout between the base and derived objects something that is not exactly a clean "slice".
Related
I came across a problem which can be reduced to the following example:
#include "iostream"
struct A {
char a;
};
struct B : A {
virtual void f() = 0;
};
struct C : B {
void f() override {}
};
void f(A* fpa) {
std::cout << fpa << '\n' << reinterpret_cast<C*>(fpa) << std::endl;
}
int main() {
C c {};
A* pa {&c};
std::cout << &c << '\n' << pa << '\n' << reinterpret_cast<C*>(pa) << std::endl;
f(&c);
}
Neither pa nor fpa keep pointing to the address of c, although both are being initialized with &c. All addresses printed after that of &c directly are offset by +8 (tested with g++ and clang++). Removing either A::a or B::f() and C::f() or initializing pa and fpa with reinterpret_cast<A*>(&c) instead of just &c fixes the addresses.
But why do I have to do that? Shouldn't any A* be able to hold the address to any A, B, or C in this case since all inheritance is public? Why does the value change implicitly? And are there warning flags I can pass to g++ or clang++ that warn about this kind of behavior?
or initializing pa and fpa with reinterpret_cast<A*>(&c) instead of just &c fixes the addresses.
That doesn't "fix" the address. That breaks the address. It yields an invalid pointer.
But why do I have to do that?
You don't have to do that. The offset address is the correct address of the base sub object.
Why doesn't a pointer to derived and a pointer to base point to the same address if abstract classes are involved?
Because there is something stored in the object before the base sub object.
Shouldn't any A* be able to hold the address to any A, B, or C
No. The address of a valid pointer to A is always the address of an A object. If the dynamic type is derived, then that A object is a base sub object. The base can be stored at an offset from the beginning of the derived class.
since all inheritance is public
Accessibility of the Inheritance is irrelevant in this regard.
And are there warning flags I can pass to g++ or clang++ that warn about this kind of behavior?
I highly doubt that there would be. I also don't see why you'd want a warning in such case.
We all know that when using simple single inheritance, the address of a derived class is the same as the address of the base class. Multiple inheritance makes that untrue.
Does virtual inheritance also make that untrue? In other words, is the following code correct:
struct A {};
struct B : virtual A
{
int i;
};
int main()
{
A* a = new B; // implicit upcast
B* b = reinterpret_cast<B*>(a); // fishy?
b->i = 0;
return 0;
}
We all know that when using simple single inheritance, the address of
a derived class is the same as the address of the base class.
I think the claim is not true. In the below code, we have a simple (not virtual) single (non multiple) inheritance, but the addresses are different.
class A
{
public:
int getX()
{
return 0;
}
};
class B : public A
{
public:
virtual int getY()
{
return 0;
}
};
int main()
{
B b;
B* pB = &b;
A* pA = static_cast<A*>(pB);
std::cout << "The address of pA is: " << pA << std::endl;
std::cout << "The address of pB is: " << pB << std::endl;
return 0;
}
and the output for VS2015 is:
The address of pA is: 006FF8F0
The address of pB is: 006FF8EC
Does virtual inheritance also make that untrue?
If you change the inheritance in the above code into virtual, the result will be the same. so, even in the case of virtual inheritance, the addresses of base and derived objects can be different.
The result of reinterpret_cast<B*>(a); is only guaranteed to point to the enclosing B object of a if the a subobject and the enclosing B object are pointer-interconvertible, see [expr.static.cast]/3 of the C++17 standard.
The derived class object is pointer-interconvertible with the base class object only if the derived object is standard-layout, does not have direct non-static data members and the base class object is its first base class subobject. [basic.compound]/4.3
Having a virtual base class disqualifies a class from being standard-layout. [class]/7.2.
Therefore, because B has a virtual base class and a non-static data member, b will not point to the enclosing B object, but instead b's pointer value will remain unchanged from a's.
Accessing the i member as if it was pointing to the B object then has undefined behavior.
Any other guarantees would come from your specific ABI or other specification.
Multiple inheritance makes that untrue.
That is not entirely correct. Consider this example:
struct A {};
struct B : A {};
struct C : A {};
struct D : B, C {};
When creating an instance of D, B and C are instantiated each with their respective instance of A. However, there would be no problem if the instance of D had the same address of its instance of B and its respective instance of A. Although not required, this is exactly what happens when compiling with clang 11 and gcc 10:
D: 0x7fffe08b4758 // address of instance of D
B: 0x7fffe08b4758 and A: 0x7fffe08b4758 // same address for B and A
C: 0x7fffe08b4760 and A: 0x7fffe08b4760 // other address for C and A
Does virtual inheritance also make that untrue
Let's consider a modified version of the above example:
struct A {};
struct B : virtual A {};
struct C : virtual A {};
struct D : B, C {};
Using the virtual function specifier is typically used to avoid ambiguous function calls. Therefore, when using virtual inheritance, both B and C instances must create a common A instance. When instantiating D, we get the following addresses:
D: 0x7ffc164eefd0
B: 0x7ffc164eefd0 and A: 0x7ffc164eefd0 // again, address of A and B = address of D
C: 0x7ffc164eefd8 and A: 0x7ffc164eefd0 // A has the same address as before (common instance)
Is the following code correct
There is no reason here to use reinterpret_cast, even more, it results in undefined behavior. Use static_cast instead:
A* pA = static_cast<A*>(pB);
Both casts behave differently in this example. The reinterpret_cast will reinterpret pB as a pointer to A, but the pointer pA may point to a different address, as in the above example (C vs A). The pointer will be upcasted correctly if you use static_cast.
The reason a and b are different in your case is because, since A is not having any virtual method, A is not maintaining a vtable. On the other hand, B does maintain a vtable.
When you upcast to A, the compiler is smart enough to skip the vtable meant for B. And hence the difference in addresses. You should not reinterpret_cast back to B, it wouldn't work.
To verify my claim, try adding a virtual method, say virtual void foo() {} in class A. Now A will also maintain a vtable. Thus downcast(reinterpret_cast) to B will give you back the original b.
Similar questions I found were more based on what this does; I understand the assignment of a base class pointer to a derived class, e.g Base* obj = new Derived() to be that the right side gets upcasted to a Base* type, but I would like to understand the mechanism for how this happens and how it allows for virtual to access derived class methods. From searching online, someone equated the above code to Base* obj = new (Base*)Derived, which is what led to this confusion. If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions (the functions of the Derived class)? Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj? Thanks, and apologies for the simplicity of the question. I'd like to understand what causes this behavior.
Note: for clarity, in my example, Derived publicly inherits from Base.
In a strict sense, the answer to "how does inheritance work at runtime?" is "however the compiler-writer designed it". I.e., the language specification only describes the behavior to be achieved, not the mechanism to achieve it.
In that light, the following should be seen as analogy. Compilers will do something analogous to the following:
Given a class Base:
class Base
{
int a;
int b;
public:
Base()
: a(5),
b(3)
{ }
virtual void foo() {}
virtual void bar() {}
};
The compiler will define two structures: one we'll call the "storage layout" -- this defines the relative locations of member variables and other book-keeping info for an object of the class; the second structure is the "virtual dispatch table" (or vtable). This is a structure of pointers to the implementations of the virtual methods for the class.
This figure gives an object of type Base
Now lets look as the equivalent structure for a derived class, Derived:
class Derived : public Base
{
int c;
public:
Derived()
: Base(),
c(4)
{ }
virtual void bar() //Override
{
c = a*5 + b*3;
}
};
For an object of type Derived, we have a similar structure:
The important observation is that the in-memory representation of both the member-variable storage and the vtable entries, for members a and b, and methods foo and bar, are identical between the base class and subclass. So a pointer of type Base * that happens to point to an object of type Derived will still implement an access to the variable a as a reference to the first storage offset after the vtable pointer. Likewise, calling ptr->bar() passes control to the method in the second slot of the vtable. If the object is of type Base, this is Base::bar(); if the object is of type Derived, this is Derived::bar().
In this analogy, the this pointer points to the member storage block. Hence, the implementation of Derived::bar() can access the member variable c by fetching the 3rd storage slot after the vtable pointer, relative to this. Note that this storage slot exists whenever Derived::bar() sits in the second vtable slot...i.e., when the object really is of type Derived.
A brief aside on the debugging insanity that can arise from corrupting the vtable pointer for compilers that use a literal vtable pointer at offset 0 from this:
#include <iostream>
class A
{
public:
virtual void foo()
{
std::cout << "A::foo()" << std::endl;
}
};
class B
{
public:
virtual void bar()
{
std::cout << "B::bar()" << std::endl;
}
};
int main(int argc, char *argv[])
{
A *a = new A();
B *b = new B();
std::cout << "A: ";
a->foo();
std::cout << "B: ";
b->bar();
//Frankenobject
*((void **)a) = *((void **)b); //Overwrite a's vtable ptr with b's.
std::cout << "Franken-AB: ";
a->foo();
}
Yields:
$ ./a.out
A: A::foo()
B: B::bar()
Franken-AB: B::bar()
$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
...note the lack of an inheritance relationship between A and B... :scream:
Whoever says
Base* obj = new Derived();
is equivalent to
Base* obj = new (Base*)Derived;
is ignorant of the subject matter.
It's more like:
Derived* temp = new Derived;
Base* obj = temp;
The explicit cast is not necessary. The language permits a derived class pointer to be assigned to a base class pointer.
Most of the time the numerical value of the two pointers are same but they are not same when multiple inheritance or virtual inheritance is involved.
It's the compiler's responsibility to make sure that numerical value of the pointer is offset properly when converting a derived class pointer to a base class pointer. The compiler is able to do that since it makes the decision about the layout of the derived class and the base class sub-objects in the derived class object.
If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions
There is no type casting. There is a type conversion. Regarding the virtual functions, please see How are virtual functions and vtable implemented?.
Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj?
This is moot since it does not happen the way you thought they did.
I wanted to know if it's possible to get the original type of an object when held through a pointer to base class.
For example:
class Base {
virtual void f() = 0
};
class Derived: public Base {};
Base * ptr=new Derived;
//if I use
cout << typeid(ptr).name(); //prints Base*
I want it to print the original type "Derived". Is there a way to do it?
Yes, the static and dynamic type of ptr are both Base *. However, for *ptr, the situation is different. The static type is Base &, but the dynamic type is Derived &. So that's what you want to test:
cout << typeid(*ptr).name();
You can try with dynamic_cast
if(Derived* d = dynamic_cast<Derived*>(b1))
{
std::cout << "downcast from b1 to d successful\n";
d->name(); // safe to call
}
Be careful as this is often seen as a bad practice, and you shouldn't really do it. You don't need the derived type class, try to think in terms of interfaces.
I would like to know how c++ ensures the concept layout in memory of these classes to support inheritance.
for example:
class Base1
{
public:
void function1(){cout<<"Base1"};
};
class Base2
{
public:
void function2(){cout<<"Base2"};
};
class MDerived: Base1,Base2
{
public:
void function1(){cout<<"MDerived"};
};
void function(Base1 *b1)
{
b1->function1();
}
So when I pass function an object of derived type the function should offset into the base1 class function and call it. How does C++ ensure such a layout.
When a MDerived* needs to be converted to a Base1*, the compiler adjusts the pointer to point to the correct memory address, where the members of this base class are located. This means that a MDerived* that is cast to a Base1* might point to a different memory address than the original MDerived* (depending on the memory layout of the derived class).
The compiler can do this because it knows the memory layout of all the classes, and when a cast occurs it can add code that adjusts the address of the pointer.
For example this might print different addresses:
int main() {
MDerived *d = new MDerived;
std::cout << "derived: " << d << std::endl;
std::cout << "base1: " << (base1*)d << std::endl;
std::cout << "base2: " << (base2*)d << std::endl;
}
In your example such adjustments might not be necessary since the classes don't contain any member variables that would use any memory in the sub-objects representing the base classes. If you have a pointer pointing to "nothing" (no member variables), it doesn't really matter if that nothing is called Base1 or Base2 or MDerived.
The non-virtual methods of the classes are not stored with each object, they are stored only once. The compiler then statically, at compile time, uses those global addresses when a member function is called, according to the type of the variable used.
The layout of a class in memory includes its members and base class subobjects (ยง10/2). Members are also subobjects. A pointer to a base subobject is a pointer to an object (but not a most-derived object).
When you convert an MDerived * to a Base2 *, the compiler looks up the offset of the Base2 object inside the MDerived object and uses it to generate this for the inherited method.
I think you're asking why, when you call b1->function(), does Base1::function1() fire?
If so, then the reason is because b1 is a Base1 pointer, not a MDerived pointer. The object it points to may in fact "be" an MDerived object, but function(Base1*) has no way of knowing this, so it calls the only thing it does know -- Base1::function1().
Now, if you had marked the base class function as virtual, things change:
#include <iostream>
#include <string>
using namespace std;
class Base1
{
public: virtual void function1() { cout<<"Base1"; }
};
class Base2
{
public: void function2(){cout<<"Base2";}
};
class MDerived: public Base1, public Base2
{
public: void function1(){cout<<"MDerived";}
};
void function(Base1 *b1)
{
b1->function1();
}
int main()
{
MDerived d;
function(&d);
}
The output of the program is:
"MDerived"
void function(Base1 *b1) still doesn't know that the object being pointed to is actually an MDerived, but now it doesn't have to. WHen you call virtual functions through a base class pointer, you get polymorphic behavior. Which in this case means MDerived::function1() is called because that is the most-derived type available.
Few things in your code
The multiple inheritance should be public, else the compiler will complain that there is no access to the base classes
No ; at the end of the cout statements
Did not include the <iostream> (ok may be I am being too pedantic)
To answer your question - the compiler knows that it is the base class because the type of the argument that the function is taking is Base1. The compiler converts the type of the passed data (assuming you passed a derived object, does object slicing) and then calls the function1() on it (which is a simple offset from the base pointer calculation).