Consider the following program
#include<iostream>
using namespace std;
class ClassA
{
public:
virtual ~ClassA(){};
virtual void FunctionA(){};
};
class ClassB
{
public:
virtual void FunctionB(){};
};
class ClassC : public ClassA,public ClassB
{
};
void main()
{
ClassC aObject;
ClassA* pA = &aObject;
ClassB* pB = &aObject;
ClassC* pC = &aObject;
cout<<"pA = "<<pA<<endl;
cout<<"pB = "<<pB<<endl;
cout<<"pC = "<<pC<<endl;
}
pA,pB,pC are supposed to equal,but the result is
pA = 0031FD90
pB = 0031FD94
pC = 0031FD90
why pB = pA + 4?
and when i change
class ClassA
{
public:
virtual ~ClassA(){};
virtual void FunctionA(){};
};
class ClassB
{
public:
virtual void FunctionB(){};
};
to
class ClassA
{
};
class ClassB
{
};
the result is
pA = 0030FAA3
pB = 0030FAA4
pC = 0030FAA3
pB = pA + 1?
The multiply inherited object has two merged sub-objects. I would guess the compiler is pointing one of the pointers to an internal object.
C has two inherited subobjects, therefore is the concatenation of a A object and a B object. When you have an object C, it is composed of an object A followed by an object B. They're not located at the same address, that's why. All three pointers point to the same object, but as different superclasses. The compiler makes the shift for you, so you don't have to worry about that.
Now. Why is there a difference of 4 in one case and 1 in another? Well, in the first case, you have virtual functions for both A and B, therefore each subobject has to have a pointer to its vtable (the table containing the addresses of the resolved virtual function calls). So in this case, sizeof(A) is 4. In the second case, you have no virtual functions, so no vtable. But each subobject must be addressable independently, so the compiler still has to allocate for a different addresses for subobject of class A and subobject of class B. The minimum of difference between two addresses is 1. But I wonder if EBO (empty base-class optimization) should not have kicked in in this case.
That's the implementation detail of compiler.
The reason you hit this case is because you have MI in your code.
Think about how the computer access the member in ClassB, it using the offset to access the member. So let's say you have two int in class B, it using following statement to access the second int member.
*((int*)pb + 1) // this actually will be assembly generate by compiler
But if the pb point to the start of the aObject in your class, this will not work anymore, so the compiler need to generate multiple assembly version to access the same member base on the class inherit structure, and have run-time cost.
That's why the compiler adjust the pb not equal as pa, which will make the above code works, it is the most simply and effect way to implement.
And that's also explain why pa == pc but not equals with pb.
Related
We all know that when using simple single inheritance, the address of a derived class is the same as the address of the base class. Multiple inheritance makes that untrue.
Does virtual inheritance also make that untrue? In other words, is the following code correct:
struct A {};
struct B : virtual A
{
int i;
};
int main()
{
A* a = new B; // implicit upcast
B* b = reinterpret_cast<B*>(a); // fishy?
b->i = 0;
return 0;
}
We all know that when using simple single inheritance, the address of
a derived class is the same as the address of the base class.
I think the claim is not true. In the below code, we have a simple (not virtual) single (non multiple) inheritance, but the addresses are different.
class A
{
public:
int getX()
{
return 0;
}
};
class B : public A
{
public:
virtual int getY()
{
return 0;
}
};
int main()
{
B b;
B* pB = &b;
A* pA = static_cast<A*>(pB);
std::cout << "The address of pA is: " << pA << std::endl;
std::cout << "The address of pB is: " << pB << std::endl;
return 0;
}
and the output for VS2015 is:
The address of pA is: 006FF8F0
The address of pB is: 006FF8EC
Does virtual inheritance also make that untrue?
If you change the inheritance in the above code into virtual, the result will be the same. so, even in the case of virtual inheritance, the addresses of base and derived objects can be different.
The result of reinterpret_cast<B*>(a); is only guaranteed to point to the enclosing B object of a if the a subobject and the enclosing B object are pointer-interconvertible, see [expr.static.cast]/3 of the C++17 standard.
The derived class object is pointer-interconvertible with the base class object only if the derived object is standard-layout, does not have direct non-static data members and the base class object is its first base class subobject. [basic.compound]/4.3
Having a virtual base class disqualifies a class from being standard-layout. [class]/7.2.
Therefore, because B has a virtual base class and a non-static data member, b will not point to the enclosing B object, but instead b's pointer value will remain unchanged from a's.
Accessing the i member as if it was pointing to the B object then has undefined behavior.
Any other guarantees would come from your specific ABI or other specification.
Multiple inheritance makes that untrue.
That is not entirely correct. Consider this example:
struct A {};
struct B : A {};
struct C : A {};
struct D : B, C {};
When creating an instance of D, B and C are instantiated each with their respective instance of A. However, there would be no problem if the instance of D had the same address of its instance of B and its respective instance of A. Although not required, this is exactly what happens when compiling with clang 11 and gcc 10:
D: 0x7fffe08b4758 // address of instance of D
B: 0x7fffe08b4758 and A: 0x7fffe08b4758 // same address for B and A
C: 0x7fffe08b4760 and A: 0x7fffe08b4760 // other address for C and A
Does virtual inheritance also make that untrue
Let's consider a modified version of the above example:
struct A {};
struct B : virtual A {};
struct C : virtual A {};
struct D : B, C {};
Using the virtual function specifier is typically used to avoid ambiguous function calls. Therefore, when using virtual inheritance, both B and C instances must create a common A instance. When instantiating D, we get the following addresses:
D: 0x7ffc164eefd0
B: 0x7ffc164eefd0 and A: 0x7ffc164eefd0 // again, address of A and B = address of D
C: 0x7ffc164eefd8 and A: 0x7ffc164eefd0 // A has the same address as before (common instance)
Is the following code correct
There is no reason here to use reinterpret_cast, even more, it results in undefined behavior. Use static_cast instead:
A* pA = static_cast<A*>(pB);
Both casts behave differently in this example. The reinterpret_cast will reinterpret pB as a pointer to A, but the pointer pA may point to a different address, as in the above example (C vs A). The pointer will be upcasted correctly if you use static_cast.
The reason a and b are different in your case is because, since A is not having any virtual method, A is not maintaining a vtable. On the other hand, B does maintain a vtable.
When you upcast to A, the compiler is smart enough to skip the vtable meant for B. And hence the difference in addresses. You should not reinterpret_cast back to B, it wouldn't work.
To verify my claim, try adding a virtual method, say virtual void foo() {} in class A. Now A will also maintain a vtable. Thus downcast(reinterpret_cast) to B will give you back the original b.
class Base {
public:
virtual void test() {};
virtual int get() {return 123;}
private:
int bob = 0;
};
class Derived: public Base{
public:
virtual void test() { alex++; }
virtual int get() { return alex;}
private:
int alex = 0;
};
Base* b = new Derived();
b->test();
When test and get are called, the implicit this pointer is passed in. Is it because Derived classes having a sub memory layout that is identical to what a pure base object would be, then this pointer works for both as a base pointer and derived pointer?
Another way to put it is, the memory layout for Derived is like
vptr <-- this
bob
alex
That is why it can use alex in b->test(), right?
Inside of Derived's methods, the implicit this pointer is always a Derived* pointer (more generically, the this pointer always matches the class type being called). That is why Derived::test() and Derived::get() can access the Derived::alex member. That has nothing to do with Base.
The memory layout of a Derived object begins with the data members of Base, followed by optional padding, followed by the data members of Derived. That allows you to use a Derived object wherever a Base object is expected. When you pass a Derived* pointer to a Base* pointer, or a Derived& reference to a Base& reference, the compiler will adjust the pointer/reference accordingly at compile-time to point at the Base portion of the Derived object.
When you call b->test() at runtime, where b is a Base* pointer, the compiler knows test() is virtual and will generate code that accesses the appropriate slot in b's vtable and call the method being pointed at. But, the compiler doesn't know what derived object type b is actually pointing at in runtime (that is the whole magic of polymorphism), so it can't automatically adjust the implicit this pointer to the correct derived pointer type at compile-time.
In the case where b is pointing at a Derived object, b's vtable is pointing at Derived's vtable. The compiler knows the exact offset of the start of Derived from the start of Base. So, the slot for test() in Derived's vtable will point to a private stub generated by the compiler to adjust the implicit Base *this pointer into a Derived *this pointer before then jumping into the actual implementation code for Derived::test().
Behind the scenes, it is roughly (not exactly) implemented like the following pseudo-code:
void Derived_test_stub(Base *this)
{
Derived *adjusted_this = reinterpret_cast<Derived*>(reinterpret_cast<uintptr_t>(this) + offset_from_Base_to_Derived);
Derived::test(adjusted_this);
}
int Derived_get_stub(Base *this)
{
Derived *adjusted_this = reinterpret_cast<Derived*>(reinterpret_cast<uintptr_t>(this) + offset_from_Base_to_Derived);
return Derived::get(adjusted_this);
}
struct vtable_Base
{
void* funcs[2] = {&Base::test, &Base::get};
};
struct vtable_Derived
{
void* funcs[2] = {&Derived_test_stub, &Derived_get_stub};
};
Base::Base()
{
this->vtable = &vtable_Base;
bob = 0;
}
Derived::Derived() : Base()
{
Base::vtable = &vtable_Derived;
this->vtable = &vtable_Derived;
alex = 0;
}
...
Base *b = new Derived;
//b->test(); // calls Derived::test()...
typedef void (*test_type)(Base*);
static_cast<test_type>(b->vtable[0])(b); // calls Derived_test_stub()...
//int i = b->get(); // calls Derived::get()...
typedef int (*get_type)(Base*);
int i = static_cast<get_type>(b->vtable[1])(b); // calls Derived_get_stub()...
The actual details are a bit more involved, but that should give you a basic idea of how polymorphism is able to dispatch virtual methods at runtime.
What you've shown is reasonably accurate, at least for a typical implementation. It's not guaranteed to be precisely as you've shown it (e.g., the compiler might easily insert some padding between bob and alex, but either way it "knows" that alex is at some predefined offset from this, so it can take a pointer to Base, calculate the correct offset from it, and use what's there.
Not what you asked about, so I won't try to get into detail, but just a fair warning: computing such offsets can/does get a bit more complex when/if multiple inheritance gets involved. Not so much for accessing a member of the most derived class, but if you access a member of a base class, it has to basically compute an offset to the beginning of that base class, then add an offset to get to the correct offset within that base class.
A derived class is not a seperate class but an extension. If something is allocated as derived then a pointer (which is just an address in memory) will be able to find everything from the derived class. Classes don't exist in assembly, the compiler keeps track of everything according to how it is allocated in memory and provides appropriate checking accordingly.
Similar questions I found were more based on what this does; I understand the assignment of a base class pointer to a derived class, e.g Base* obj = new Derived() to be that the right side gets upcasted to a Base* type, but I would like to understand the mechanism for how this happens and how it allows for virtual to access derived class methods. From searching online, someone equated the above code to Base* obj = new (Base*)Derived, which is what led to this confusion. If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions (the functions of the Derived class)? Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj? Thanks, and apologies for the simplicity of the question. I'd like to understand what causes this behavior.
Note: for clarity, in my example, Derived publicly inherits from Base.
In a strict sense, the answer to "how does inheritance work at runtime?" is "however the compiler-writer designed it". I.e., the language specification only describes the behavior to be achieved, not the mechanism to achieve it.
In that light, the following should be seen as analogy. Compilers will do something analogous to the following:
Given a class Base:
class Base
{
int a;
int b;
public:
Base()
: a(5),
b(3)
{ }
virtual void foo() {}
virtual void bar() {}
};
The compiler will define two structures: one we'll call the "storage layout" -- this defines the relative locations of member variables and other book-keeping info for an object of the class; the second structure is the "virtual dispatch table" (or vtable). This is a structure of pointers to the implementations of the virtual methods for the class.
This figure gives an object of type Base
Now lets look as the equivalent structure for a derived class, Derived:
class Derived : public Base
{
int c;
public:
Derived()
: Base(),
c(4)
{ }
virtual void bar() //Override
{
c = a*5 + b*3;
}
};
For an object of type Derived, we have a similar structure:
The important observation is that the in-memory representation of both the member-variable storage and the vtable entries, for members a and b, and methods foo and bar, are identical between the base class and subclass. So a pointer of type Base * that happens to point to an object of type Derived will still implement an access to the variable a as a reference to the first storage offset after the vtable pointer. Likewise, calling ptr->bar() passes control to the method in the second slot of the vtable. If the object is of type Base, this is Base::bar(); if the object is of type Derived, this is Derived::bar().
In this analogy, the this pointer points to the member storage block. Hence, the implementation of Derived::bar() can access the member variable c by fetching the 3rd storage slot after the vtable pointer, relative to this. Note that this storage slot exists whenever Derived::bar() sits in the second vtable slot...i.e., when the object really is of type Derived.
A brief aside on the debugging insanity that can arise from corrupting the vtable pointer for compilers that use a literal vtable pointer at offset 0 from this:
#include <iostream>
class A
{
public:
virtual void foo()
{
std::cout << "A::foo()" << std::endl;
}
};
class B
{
public:
virtual void bar()
{
std::cout << "B::bar()" << std::endl;
}
};
int main(int argc, char *argv[])
{
A *a = new A();
B *b = new B();
std::cout << "A: ";
a->foo();
std::cout << "B: ";
b->bar();
//Frankenobject
*((void **)a) = *((void **)b); //Overwrite a's vtable ptr with b's.
std::cout << "Franken-AB: ";
a->foo();
}
Yields:
$ ./a.out
A: A::foo()
B: B::bar()
Franken-AB: B::bar()
$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
...note the lack of an inheritance relationship between A and B... :scream:
Whoever says
Base* obj = new Derived();
is equivalent to
Base* obj = new (Base*)Derived;
is ignorant of the subject matter.
It's more like:
Derived* temp = new Derived;
Base* obj = temp;
The explicit cast is not necessary. The language permits a derived class pointer to be assigned to a base class pointer.
Most of the time the numerical value of the two pointers are same but they are not same when multiple inheritance or virtual inheritance is involved.
It's the compiler's responsibility to make sure that numerical value of the pointer is offset properly when converting a derived class pointer to a base class pointer. The compiler is able to do that since it makes the decision about the layout of the derived class and the base class sub-objects in the derived class object.
If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions
There is no type casting. There is a type conversion. Regarding the virtual functions, please see How are virtual functions and vtable implemented?.
Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj?
This is moot since it does not happen the way you thought they did.
In the code below, pC == pA:
class A
{
};
class B : public A
{
public:
int i;
};
class C : public B
{
public:
char c;
};
int main()
{
C* pC = new C;
A* pA = (A*)pC;
return 0;
}
But when I add a pure virtual function to B and implement it in C, pA != pC:
class A
{
};
class B : public A
{
public:
int i;
virtual void Func() = 0;
};
class C : public B
{
public:
char c;
void Func() {}
};
int main()
{
C* pC = new C;
A* pA = (A*)pC;
return 0;
}
Why is pA not equal to pC in this case? Don't they both still point to the same "C" object in memory?
You're seeing a different value for your pointer because the new virtual function is causing the injection of a vtable pointer into your object. VC++ is putting the vtable pointer at the beginning of the object (which is typical, but purely an internal detail).
Let's add a new field to A so that it's easier to explain.
class A {
public:
int a;
};
// other classes unchanged
Now, in memory, your pA and A look something like this:
pA --> | a | 0x0000004
Once you add B and C into the mix, you end up with this:
pC --> | vtable | 0x0000000
pA --> | a | 0x0000004
| i | 0x0000008
| c | 0x000000C
As you can see, pA is pointing to the data after the vtable, because it doesn't know anything about the vtable or how to use it, or even that it's there. pC does know about the vtable, so it points directly to the table, which simplifies its use.
A pointer to an object is convertible to a pointer to base object and vice versa, but the conversion doesn't have to be trivial. It's entirely possible, and often necessary, that the base pointer has a different value than the derived pointer. That's why you have a strong type system and conversions. If all pointers were the same, you wouldn't need either.
Here are my assumptions, based on the question.
1) You have a case where you cast from a C to an A and you get the expected behaviour.
2) You added a virtual function, and that cast no longer works (in that you can no longer pull data from A directly after the cast to A, you get data that makes no sense to you).
If these assumptions are true the hardship you are experiencing is the insertion of the virtual table in B. This means the data in the class is no longer perfectly lined up with the data in the base class (as in the class has added bytes, the virtual table, that are hidden from you). A fun test would be to check sizeof to observe the growth of unknown bytes.
To resolve this you should not cast directly from A to C to harvest data. You should add a getter function that is in A and inherited by B and C.
Given your update in the comments, I think you should read this, it explains virtual tables and the memory layout, and how it is compiler dependent. That link explains, in more detail, what I explained above, but gives examples of the pointers being different values. Really, I had WHY you were asking the question wrong, but it seems the information is still what you wanted. The cast from C to A takes into account the virtual table at this point (note C-8 is 4, which on a 32 bit system would be the size of the address needed for the virtual table, I believe).
I'm having trouble understanding what the purpose of the virtual keyword in C++. I know C and Java very well but I'm new to C++
From wikipedia
In object-oriented programming, a
virtual function or virtual method is
a function or method whose behavior
can be overridden within an inheriting
class by a function with the same
signature.
However I can override a method as seen below without using the virtual keyword
#include <iostream>
using namespace std;
class A {
public:
int a();
};
int A::a() {
return 1;
}
class B : A {
public:
int a();
};
int B::a() {
return 2;
}
int main() {
B b;
cout << b.a() << endl;
return 0;
}
//output: 2
As you can see below, the function A::a is successfully overridden with B::a without requiring virtual
Compounding my confusion is this statement about virtual destructors, also from wikipedia
as illustrated in the following example,
it is important for a C++ base class
to have a virtual destructor to ensure
that the destructor from the most
derived class will always be called.
So virtual also tells the compiler to call up the parent's destructors? This seems to be very different from my original understanding of virtual as "make the function overridable"
Make the following changes and you will see why:
#include <iostream>
using namespace std;
class A {
public:
int a();
};
int A::a() {
return 1;
}
class B : public A { // Notice public added here
public:
int a();
};
int B::a() {
return 2;
}
int main() {
A* b = new B(); // Notice we are using a base class pointer here
cout << b->a() << endl; // This will print 1 instead of 2
delete b; // Added delete to free b
return 0;
}
Now, to make it work like you intended:
#include <iostream>
using namespace std;
class A {
public:
virtual int a(); // Notice virtual added here
};
int A::a() {
return 1;
}
class B : public A { // Notice public added here
public:
virtual int a(); // Notice virtual added here, but not necessary in C++
};
int B::a() {
return 2;
}
int main() {
A* b = new B(); // Notice we are using a base class pointer here
cout << b->a() << endl; // This will print 2 as intended
delete b; // Added delete to free b
return 0;
}
The note that you've included about virtual destructors is exactly right. In your sample there is nothing that needs to be cleaned-up, but say that both A and B had destructors. If they aren't marked virtual, which one is going to get called with the base class pointer? Hint: It will work exactly the same as the a() method did when it was not marked virtual.
You could think of it as follows.
All functions in Java are virtual. If you have a class with a function, and you override that function in a derived class, it will be called, no matter the declared type of the variable you use to call it.
In C++, on the other hand, it won't necessarily be called.
If you have a base class Base and a derived class Derived, and they both have a non-virtual function in them named 'foo', then
Base * base;
Derived *derived;
base->foo(); // calls Base::foo
derived->foo(); // calls Derived::foo
If foo is virtual, then both call Derived::foo.
virtual means that the actual method is determined runtime based on what class was instantiated not what type you used to declare your variable.
In your case this is a static override it will go for the method defined for class B no matter what was the actual type of the object created
So virtual also tells the compiler to call up the parent's destructors? This seems to be very different from my original understanding of virtual as "make the function overridable"
Your original and your new understanding are both wrong.
Methods (you call them functions) are always overridable. No matter if virtual, pure, nonvirtual or something.
Parent destructors are always called. As are the constructors.
"Virtual" does only make a difference if you call a method trough a pointer of type pointer-to-baseclass. Since in your example you don't use pointers at all, virtual doesn't make a difference at all.
If you use a variable a of type pointer-to-A, that is A* a;, you can not only assign other variables of type pointer-to-A to it, but also variables of type pointer-to-B, because B is derived from A.
A* a;
B* b;
b = new B(); // create a object of type B.
a = b; // this is valid code. a has still the type pointer-to-A,
// but the value it holds is b, a pointer to a B object.
a.a(); // now here is the difference. If a() is non-virtual, A::a()
// will be called, because a is of type pointer-to-A.
// Whether the object it points to is of type A, B or
// something entirely different doesn't matter, what gets called
// is determined during compile time from the type of a.
a.a(); // now if a() is virtual, B::a() will be called, the compiler
// looks during runtime at the value of a, sees that it points
// to a B object and uses B::a(). What gets called is determined
// from the type of the __value__ of a.
As you can see below, the function A::a is successfully overridden with B::a without requiring virtual
It may, or it may not work. In your example it works, but it's because you create and use an B object directly, and not through pointer to A. See C++ FAQ Lite, 20.3.
So virtual also tells the compiler to call up the parent's destructors?
A virtual destructor is needed if you delete a pointer of base class pointing to an object of derived class, and expect both base and derived destructors to run. See C++ FAQ Lite, 20.7.
You need the virtual if you use a base class pointer as consultutah (and others while I'm typing ;) ) says it.
The lack of virtuals allows to save a check to know wich method it need to call (the one of the base class or of some derived). However, at this point don't worry about performances, just on correct behaviour.
The virtual destructor is particulary important because derived classes might declare other variables on the heap (i.e. using the keyword 'new') and you need to be able to delete it.
However, you might notice, that in C++, you tend to use less deriving than in java for example (you often use templates for a similar use), and maybe you don't even need to bother about that. Also, if you never declare your objects on the heap ("A a;" instead of "A * a = new A();") then you don't need to worry about it either. Of course, this will heavily depend on what/how you develop and if you plan that someone else will derive your class or not.
Try ((A*)&b).a() and see what gets called then.
The virtual keyword lets you treat an object in an abstract way (I.E. through a base class pointer) and yet still call descendant code...
Put another way, the virtual keyword "lets old code call new code". You may have written code to operate on A's, but through virtual functions, that code can call B's newer a().
Say you instantiated B but held it as an instance of an A:
A *a = new B();
and called function a() whose implementation of a() will be called?
If a() isn't virtual A's will be called. If a() was virtual the instantiated sub class version of a() would be called regardless of how you're holding it.
If B's constructor allocated tons of memory for arrays or opened files, calling
delete a;
would ensure B's destructor was called regardless as to how it was being held, be it by a base class or interface or whatever.
Good question by the way.
I always think about it like chess pieces (my first experiment with OO).
A chessboard holds pointers to all the pieces. Empty squares are NULL pointers. But all it knows is that each pointer points a a chess piece. The board does not need to know more information. But when a piece is moved the board does not know it is a valid move as each pice has different characteristica about how it moves. So the board needs to check with the piece if the move is valid.
Piece* board[8][8];
CheckMove(Point const& from,Point const& too)
{
Piece* piece = board[from.x][from.y];
if (piece != NULL)
{
if (!piece->checkValidMove(from,too))
{ throw std::exception("Bad Move");
}
// Other checks.
}
}
class Piece
{
virtual bool checkValidMove(Point const& from,Point const& too) = 0;
};
class Queen: public Piece
{
virtual bool checkValidMove(Point const& from,Point const& too)
{
if (CheckHorizontalMove(from,too) || CheckVerticalMoce(from,too) || CheckDiagonalMove(from,too))
{
.....
}
}
}