how C++ ensures the concept of base class and derived class - c++

I would like to know how c++ ensures the concept layout in memory of these classes to support inheritance.
for example:
class Base1
{
public:
void function1(){cout<<"Base1"};
};
class Base2
{
public:
void function2(){cout<<"Base2"};
};
class MDerived: Base1,Base2
{
public:
void function1(){cout<<"MDerived"};
};
void function(Base1 *b1)
{
b1->function1();
}
So when I pass function an object of derived type the function should offset into the base1 class function and call it. How does C++ ensure such a layout.

When a MDerived* needs to be converted to a Base1*, the compiler adjusts the pointer to point to the correct memory address, where the members of this base class are located. This means that a MDerived* that is cast to a Base1* might point to a different memory address than the original MDerived* (depending on the memory layout of the derived class).
The compiler can do this because it knows the memory layout of all the classes, and when a cast occurs it can add code that adjusts the address of the pointer.
For example this might print different addresses:
int main() {
MDerived *d = new MDerived;
std::cout << "derived: " << d << std::endl;
std::cout << "base1: " << (base1*)d << std::endl;
std::cout << "base2: " << (base2*)d << std::endl;
}
In your example such adjustments might not be necessary since the classes don't contain any member variables that would use any memory in the sub-objects representing the base classes. If you have a pointer pointing to "nothing" (no member variables), it doesn't really matter if that nothing is called Base1 or Base2 or MDerived.
The non-virtual methods of the classes are not stored with each object, they are stored only once. The compiler then statically, at compile time, uses those global addresses when a member function is called, according to the type of the variable used.

The layout of a class in memory includes its members and base class subobjects (ยง10/2). Members are also subobjects. A pointer to a base subobject is a pointer to an object (but not a most-derived object).
When you convert an MDerived * to a Base2 *, the compiler looks up the offset of the Base2 object inside the MDerived object and uses it to generate this for the inherited method.

I think you're asking why, when you call b1->function(), does Base1::function1() fire?
If so, then the reason is because b1 is a Base1 pointer, not a MDerived pointer. The object it points to may in fact "be" an MDerived object, but function(Base1*) has no way of knowing this, so it calls the only thing it does know -- Base1::function1().
Now, if you had marked the base class function as virtual, things change:
#include <iostream>
#include <string>
using namespace std;
class Base1
{
public: virtual void function1() { cout<<"Base1"; }
};
class Base2
{
public: void function2(){cout<<"Base2";}
};
class MDerived: public Base1, public Base2
{
public: void function1(){cout<<"MDerived";}
};
void function(Base1 *b1)
{
b1->function1();
}
int main()
{
MDerived d;
function(&d);
}
The output of the program is:
"MDerived"
void function(Base1 *b1) still doesn't know that the object being pointed to is actually an MDerived, but now it doesn't have to. WHen you call virtual functions through a base class pointer, you get polymorphic behavior. Which in this case means MDerived::function1() is called because that is the most-derived type available.

Few things in your code
The multiple inheritance should be public, else the compiler will complain that there is no access to the base classes
No ; at the end of the cout statements
Did not include the <iostream> (ok may be I am being too pedantic)
To answer your question - the compiler knows that it is the base class because the type of the argument that the function is taking is Base1. The compiler converts the type of the passed data (assuming you passed a derived object, does object slicing) and then calls the function1() on it (which is a simple offset from the base pointer calculation).

Related

What is the underlying mechanism of a base class pointer assigned to derived class

Similar questions I found were more based on what this does; I understand the assignment of a base class pointer to a derived class, e.g Base* obj = new Derived() to be that the right side gets upcasted to a Base* type, but I would like to understand the mechanism for how this happens and how it allows for virtual to access derived class methods. From searching online, someone equated the above code to Base* obj = new (Base*)Derived, which is what led to this confusion. If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions (the functions of the Derived class)? Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj? Thanks, and apologies for the simplicity of the question. I'd like to understand what causes this behavior.
Note: for clarity, in my example, Derived publicly inherits from Base.
In a strict sense, the answer to "how does inheritance work at runtime?" is "however the compiler-writer designed it". I.e., the language specification only describes the behavior to be achieved, not the mechanism to achieve it.
In that light, the following should be seen as analogy. Compilers will do something analogous to the following:
Given a class Base:
class Base
{
int a;
int b;
public:
Base()
: a(5),
b(3)
{ }
virtual void foo() {}
virtual void bar() {}
};
The compiler will define two structures: one we'll call the "storage layout" -- this defines the relative locations of member variables and other book-keeping info for an object of the class; the second structure is the "virtual dispatch table" (or vtable). This is a structure of pointers to the implementations of the virtual methods for the class.
This figure gives an object of type Base
Now lets look as the equivalent structure for a derived class, Derived:
class Derived : public Base
{
int c;
public:
Derived()
: Base(),
c(4)
{ }
virtual void bar() //Override
{
c = a*5 + b*3;
}
};
For an object of type Derived, we have a similar structure:
The important observation is that the in-memory representation of both the member-variable storage and the vtable entries, for members a and b, and methods foo and bar, are identical between the base class and subclass. So a pointer of type Base * that happens to point to an object of type Derived will still implement an access to the variable a as a reference to the first storage offset after the vtable pointer. Likewise, calling ptr->bar() passes control to the method in the second slot of the vtable. If the object is of type Base, this is Base::bar(); if the object is of type Derived, this is Derived::bar().
In this analogy, the this pointer points to the member storage block. Hence, the implementation of Derived::bar() can access the member variable c by fetching the 3rd storage slot after the vtable pointer, relative to this. Note that this storage slot exists whenever Derived::bar() sits in the second vtable slot...i.e., when the object really is of type Derived.
A brief aside on the debugging insanity that can arise from corrupting the vtable pointer for compilers that use a literal vtable pointer at offset 0 from this:
#include <iostream>
class A
{
public:
virtual void foo()
{
std::cout << "A::foo()" << std::endl;
}
};
class B
{
public:
virtual void bar()
{
std::cout << "B::bar()" << std::endl;
}
};
int main(int argc, char *argv[])
{
A *a = new A();
B *b = new B();
std::cout << "A: ";
a->foo();
std::cout << "B: ";
b->bar();
//Frankenobject
*((void **)a) = *((void **)b); //Overwrite a's vtable ptr with b's.
std::cout << "Franken-AB: ";
a->foo();
}
Yields:
$ ./a.out
A: A::foo()
B: B::bar()
Franken-AB: B::bar()
$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
...note the lack of an inheritance relationship between A and B... :scream:
Whoever says
Base* obj = new Derived();
is equivalent to
Base* obj = new (Base*)Derived;
is ignorant of the subject matter.
It's more like:
Derived* temp = new Derived;
Base* obj = temp;
The explicit cast is not necessary. The language permits a derived class pointer to be assigned to a base class pointer.
Most of the time the numerical value of the two pointers are same but they are not same when multiple inheritance or virtual inheritance is involved.
It's the compiler's responsibility to make sure that numerical value of the pointer is offset properly when converting a derived class pointer to a base class pointer. The compiler is able to do that since it makes the decision about the layout of the derived class and the base class sub-objects in the derived class object.
If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions
There is no type casting. There is a type conversion. Regarding the virtual functions, please see How are virtual functions and vtable implemented?.
Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj?
This is moot since it does not happen the way you thought they did.

How does the compiler call a Derived member function, when I use a member function pointer defined in terms of the Base class?

The code below works, but I'm not quite sure I understand why the member function pointer memfunc_ptr ends up pointing to the correct function Derived::member_func() (see example here). I know that a member function pointer defines an offset into the class of the object, for which it's defined, in this case class Base. So, to justify the result obtained by the code, i.e., that the member function Derived::member_func is called, instead of Base::member_func, I have to conclude that this offset is applied to the vtable of the class Derived, as member_func() is virtual, and the object d is of class Derived. Does that make sense?
#include <iostream>
class Base {
public:
virtual void member_func() {
std::cout << "Base" << '\n'; };
};
class Derived : public Base {
public:
virtual void member_func() { std::cout << "Derived" << '\n'; };
};
int main() {
typedef void (Base::*MFP)();
MFP memfunc_ptr;
memfunc_ptr = &Base::member_func;
Derived d;
(d.*memfunc_ptr)();
}
member_func() is a virtual function. The compiler will hence always call the most appropriate function for the real type of the object that is pointed to. The way this is ensured is implementation specific.
The most frequently way to do it is to use a vtable, a table of funtion pointers. A pointer to the vtable is initialised during the construction of the object. Every time you refer to such a virtual function, the compiler will generate code to find the function pointer in the vtable (using an ofset in the vtable). And when you use a pointer to a virtual function, the compiler will generate code to find the right pointer in the vtable.
This article will show you how this work with more details.

Virtual function hiding in derived class

I have two classes related by inheritance:-
class Base
{
public:
virtual void f(int x)
{
cout << "BASE::int" << endl;
}
virtual void f(double x)
{
cout << "BASE::double" << endl;
}
};
class Derived : public Base
{
public:
virtual void f(str::string s)
{
cout << "DERIVED::string" << endl;
}
};
I have provided same method in derived class with different parameters. That means rather than overriding I am hiding base class versions of this function. So, below calls are expected and clear to me.
std::string str("Hello");
Base b;
b.f(1); //calls base class version.
b.f(str); //error.
Derived d;
d.f(1); //error.
d.f(str); //calls derived class version.
But I am not able get clarification for this last scenario.
Base *b = new Derived;
b->f(str); //results in error.
Would compiler not bind this call to derived version of f using vtables and vptrs. But instead it's doing something else. Can anyone provide me complete path how compiler would try to resolve this call as per language mechanisms.
If your pointer is of type Base* then you can only "see" members that are defined in class Base. The compiler doesn't (or pretends not to) "know" that the variable really points to an instance of Derived, even if you just assigned one to it on the previous line.
When you declare a variable to be of type Base*, you're telling the compiler: treat this as something that could point to a Base or to any class derived from it. So you can't access members that are defined in a particular derived class, because there's no guarantee that the pointer actually points to an instance of that derived class.
The vtable only enters the picture at runtime. The generated assembly would have a lookup of the vptr value for a function and a jump to that address. This also means that the polymorphism is "restricted" to functions that Base knows about. Note that this is what makes more sense as well - the definition of a class should only depend on itself and its parents. If you wanted to make Base* b aware of the virtual functions implemented by Derived, you would end up with the number of vtable entries in Bases depending on its children.

Low level details of inheritance and polymorphism

This question is one of the big doubts that looms around my head and is also hard to describe it in terms of words . Some times it seems obvious and sometimes a tough one to crack.So the question goes like this::
class Base{
public:
int a_number;
Base(){}
virtual void function1() {}
virtual void function2() {}
void function3() {}
};
class Derived:public Base{
public:
Derived():Base() {}
void function1() {cout &lt&lt "Derived from Base" &lt&lt endl;
virtual void function4() {cout &lt&lt "Only in derived" &lt&lt endl;}
};
int main(){
Derived *der_ptr = new Derived();
Base *b_ptr = der_ptr; // As just address is being passed , b_ptr points to derived object
b_ptr -> function4(); // Will Give Compilation ERROR!!
b_ptr -> function1(); // Calls the Derived class overridden method
return 0;
}
Q1. Though b_ptr is pointing to Derived object, to which VTABLE it accesses and HOW ? as b_ptr -> function4() gives compilation error. Or is it that b_ptr can access only upto that size of Base class VTABLE in Derived VTABLE?
Q2. Since the memory layout of the Derived must be (Base,Derived) , is the VTABLE of the Base class also included in the memory layout of the Derived class?
Q3. Since the function1 and function2 of base class Vtable points to the Base class implementation and function2 of Derived class points to function2 of Base class, Is there really a need of VTABLE in the Base class?? (This might be the dumbest question I can ever ask, but still I am in doubt about this in my present state and the answer must be related to answer of Q1 :) )
Please Comment.
Thanks for the patience.
As a further illustration, here is a C version of your C++ program, showing vtables and all.
#include <stdlib.h>
#include <stdio.h>
typedef struct Base Base;
struct Base_vtable_layout{
void (*function1)(Base*);
void (*function2)(Base*);
};
struct Base{
struct Base_vtable_layout* vtable_ptr;
int a_number;
};
void Base_function1(Base* this){}
void Base_function2(Base* this){}
void Base_function3(Base* this){}
struct Base_vtable_layout Base_vtable = {
&Base_function1,
&Base_function2
};
void Base_Base(Base* this){
this->vtable_ptr = &Base_vtable;
};
Base* new_Base(){
Base *res = (Base*)malloc(sizeof(Base));
Base_Base(res);
return res;
}
typedef struct Derived Derived;
struct Derived_vtable_layout{
struct Base_vtable_layout base;
void (*function4)(Derived*);
};
struct Derived{
struct Base base;
};
void Derived_function1(Base* _this){
Derived *this = (Derived*)_this;
printf("Derived from Base\n");
}
void Derived_function4(Derived* this){
printf("Only in derived\n");
}
struct Derived_vtable_layout Derived_vtable =
{
{ &Derived_function1,
&Base_function2},
&Derived_function4
};
void Derived_Derived(Derived* this)
{
Base_Base((Base*)this);
this->base.vtable_ptr = (struct Base_vtable_layout*)&Derived_vtable;
}
Derived* new_Derived(){
Derived *res = (Derived*)malloc(sizeof(Derived));
Derived_Derived(res);
return res;
}
int main(){
Derived *der_ptr = new_Derived();
Base *b_ptr = &der_ptr->base;
/* b_ptr->vtable_ptr->function4(b_ptr); Will Give Compilation ERROR!! */
b_ptr->vtable_ptr->function1(b_ptr);
return 0;
}
Q1 - name resolution is static. Since b_ptr is of type Base*, the compiler can't see any of the names unique to Derived in order to access their entries in the v_table.
Q2 - Maybe, maybe not. You have to remember that the vtable itself is simply a very common method of implementing runtime polymorphism and is actually not part of the standard anywhere. No definitive statement can be made about where it resides. The vtable could actually be some static table somewhere in the program that is pointed to from within the object description of instances.
Q3 - If there's a virtual entry in one place there must be in all places otherwise a bunch of difficult/impossible checks would be necessary to provide override capability. If the compiler KNOWS that you have a Base and are calling an overridden function though, it is not required to access the vtable but could simply use the function directly; it can even inline it if it wants.
A1. The vtable pointer is pointing to a Derived vtable, but the compiler doesn't know that. You told it to treat it as a Base pointer, so it can only call methods that are valid for the Base class, no matter what the pointer points to.
A2. The vtable layout is not specified by the standard, it isn't even officially part of the class. It's just the 99.99% most common implementation method. The vtable isn't part of the object layout, but there's a pointer to the vtable that's a hidden member of the object. It will always be in the same relative location in the object so that the compiler can always generate code to access it, no matter which class pointer it has. Things get more complicated with multiple inheritance, but lets not go there yet.
A3. Vtables exist once per class, not once per object. The compiler needs to generate one even if it never gets used, because it doesn't know that ahead of time.
b_ptr points to the Derived vtable- but the compiler can't guarantee that the derived class has a function_4 in it, as that's not contained within the base vtable, so the compiler doesn't know how to make the call and throws an error.
No, the vtable is a static constant somewhere else in the program. The base class merely holds a pointer to it. A Derived class may hold two vtable pointers, but it might not.
In the context of these two classes, then Base needs a vtable to find Derived's function1, which actually is virtual even though you didn't mark it as so, because it overrode a base class virtual function. However, even if this wasn't the case, I'm pretty sure that the compiler is required to produce the vtables anyway, as it has no idea what other classes you have in other translation units that may or may not inherit from these classes and override their virtual functions in unknowable ways.
First, and most important, remember that C++ doesn't do a lot of run-time introspection of any kind. Basically, it needs to know everything about the objects at compile time.
Q1 - b_ptr is a pointer to a Base. Therefore it can only access things that are present in a Base object. No exceptions. Now, the actual implementation may change depending on the actual type of the object, but there's no getting around the method having to be defined in Base if you want to call it through a Base pointer.
Q2 - The simple answer is 'yes, the vtable for a base has to be present in a derived', but there are a LOT of possible strategies for how to layout a vtable, so don't get hung up on it's exact structure.
Q3 - Yes, there must be a vtable in the Base class. Everything that calls virtual functions in a class will go through the vtable so that if the underlying object is actually a Derived, everything can work.
Now that's not an absolute, because if the compiler can be ABSOLUTELY sure that it knows what it's got (as might be the case of a Base object that's declared on the local stack), then the compiler is allowed to optimize out the vtable lookups, and might even be allowed to inline the function.
All of this depends on the implementation. But here are the answers for the usual simplest way using "vtables".
The Base class has a vtable pointer, so the underlying representation is something like this pseudo-code:
struct Base {
void** __vtable;
int a_number;
};
void* __Base_vtable[] = { &Base::function1, &Base::function2 };
void __Base_ctor( struct Base* this_ptr ) { this_ptr->__vtable = __Base_vtable; }
The Derived class includes a Base class subobject. Since that has a place for a vtable, Derived doesn't need to add another one.
struct Derived {
struct Base __base;
};
void* __Derived_vtable[] =
{ &Derived::function1, &Base::function2, &Derived::function4 };
void __Derived_ctor( struct Derived* this_ptr ) {
__Base_ctor( &this_ptr->__base );
this_ptr->__base.__vtable = __Derived_vtable;
}
The "vtable for the Base class", __Base_vtable in my pseudocode, is needed in case somebody tries new Base(); or Base obj;.
All of the above gets more complicated when multiple inheritance or virtual inheritance is involved....
For the line b_ptr -> function4();, this is a compile-time error, not much related to vtables. When you cast to a Base* pointer, you may only use that pointer in the ways defined by class Base (because the compiler doesn't "know" any more whether it's really a Derived, a Base, or some other class). If Derived has a data member of its own, you can't access it through that pointer. If Derived has a member function of its own, virtual or not, you can't access it through that pointer.

Some final questions about inheritance/casting

I asked a question an hour or two ago that is similar but this is fundamentally different to me. After this I should be good.
class base
{
private:
string tame;
public:
void kaz(){}
virtual ~base() {}
void print() const
{
cout << tame << endl;
}
};
class derived: public base
{
private:
string taok;
public:
std::string name_;
explicit derived( const std::string& n ) : name_( n ) {}
derived(){}
void blah(){taok = "ok";}
void print() const
{
std::cout << "derived: " << name_ << std::endl;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
base b;
derived d;
base * c = &b;
derived * e = (derived *)&b;
e->kaz();
system("pause");
return 0;
}
I know downcasting in in this example is not good practice but I'm just using it as an example. So when I now am pointing to a base object from a derived pointer, I don't get why I am still able to do certain operations only belonging to the base class.
For example, the base class's interface has a Kaz() method but the derived method does not. When I downcast, why does the compiler not yell at me for doing this even though Kaz() is not part of the derived class's interface?
Why is the compiler not complaining for using members of the base class when I am using a derived pointer?
Why does the compiler yell at me only when I access a member from the base class interface from within a method but not directly?
For example:
I can't do this:
e->print() //Program crashes
But I can do this:
e->tame = "Blah";
cout << e->tame << endl;
First you should use dynamic_cast<derived*>(&b) instead of the C-style cast.
Then you should declare the print() method as virtual if you want to override the method in the subclass.
The line e->tame = "Blah"; should cause the compiler to issue an error if not used inside of a class method because it is declared private.
Last the kaz() method can be called on the derived object because the subclass contains all methods from the base class plus the ones defined in the derived class.
The derived class inherits all the members of the base class, so kaz() exists also for derived objects. If you call kaz() on a derived object, simply the method that was inherited from base is called. If you access the inherited members from within a method or directly doesn't matter.
The problem with e is that it is really pointing to a base object, not a derived. With the cast e = (derived *)&b you tell the compiler "I know it doesn't look like it, but this really is a derived *, believe me!". And the compiler believes you, since you are the master. But you lied and &b was actually not a derived*. Therefore horrible things happen when the compiler tries to call derived::print() on it, in this case it leads to a crash of the program.
When you access e->tame directly, also horrible things could happen (the compiler still treats e as a derived* while it only is a base*). In this case, by chance, it happens to print out the expected value anyway.
From the compiler standpoint, e is a "derived" object pointer.
From the runtime standpoint, e is pointing to a "base" object, so e->name_ is pointing to a random address. Eventhough you cast it, it doesn't change the fact that it is a "base" object that was allocated.
If you take a closer look, none of the derived constructors initializes the base name, and neither does the base constructor (which is the only one used).
So when you do e->print() the value you want to print is undefined.
When you manually set it, its no longer undefined. So a call to e->print() will work just fine
You could try this
In base class define a non-default constructor that receives a name
base(std::string name):tame(name){}