Polymorphic pointer change at run time - c++

I am really confused about polymorphic pointers. I have 2 classes derived from an interface as shown below code.
#include <iostream>
using namespace std;
class Base {
public:
virtual ~Base() { }
virtual void addTest() = 0;
};
class B: public Base {
public:
B(){}
~B(){}
void addTest(){
cout << "Add test B\n";
}
};
class C: public Base {
public:
C(){}
~C(){}
void addTest(){
cout << "Add test C\n";
}
private:
void deleteTest(){
}
};
int main()
{
Base *base = new B();
base->addTest();
base = new C();
base->addTest();
return 0;
}
I want to change the pointer dynamically according to a condition at run time to use the same pointer with different kinds of scenarios.
Derived classes are different from each other, so what happens in memory when the polymorphic pointer object changes?
If that usage is not good practice, how can I change the polymorphic pointer object dynamically at the run time?

It's perfectly fine to change what a pointer points to. A Base* is not an instance of Base, it is a pointer that points to an instance of a Base (or something derived from it -- in this case B or C).
Thus in your code, base = new B() sets it to point to a new instance of a B, and then base = new C() sets it to point to a new instance of a C.
Derived classes are different from each other, so what happens in memory when the polymorphic pointer object changes?
Because Base* points to an instance of a Base, all this is doing is changing which instance (or derived instance) Base* points to. In effect, it just changes the memory address of the pointer.
From that Base* pointer, you still have access to anything defined in that Base class -- which still allows polymorphic calls to functions satisfied by derived types if the function is defined as virtual.
The exact mechanism for how this is dispatched to derived types is technically an implementation-detail of the language, but generally this is done through a process called double-dispatch, which uses a "V-Table". This is additional type-information stored alongside any classes that contain virtual functions (it's conceptually just a struct of function pointers, where the function pointers are satisfied by the concrete types).
See: Why do we need a virtual table? for more information on vtables.
What is problematic, however, is the use of new here. new allocates memory that must be cleaned up with delete to avoid a memory leak. By doing the following:
Base *base = new B();
base->addTest();
base = new C(); // overwriting base without deleting the old instance
base->addTest();
The B object's destructor is never run, no resources are cleaned up, and the memory for B itself is never reclaimed. This should be:
Base *base = new B();
base->addTest();
delete base;
base = new C(); // overwriting base without deleting the old instance
base->addTest();
delete base;
Or, better yet, this should be using smart-pointers like std::unique_ptr to do this for you. In which case you don't use new and delete explicitly, you use std::make_unique for allocation, and the destructor automagically does this for you:
auto base = std::make_unique<B>();
base->addTest();
base = std::make_unique<C>(); // destroy's the old instance before reassigning
base->addTest();
This is the recommended/modern way to write dynamic allocations

Related

Why does a base pointer can access derived member variable in virtual funtion

class Base {
public:
virtual void test() {};
virtual int get() {return 123;}
private:
int bob = 0;
};
class Derived: public Base{
public:
virtual void test() { alex++; }
virtual int get() { return alex;}
private:
int alex = 0;
};
Base* b = new Derived();
b->test();
When test and get are called, the implicit this pointer is passed in. Is it because Derived classes having a sub memory layout that is identical to what a pure base object would be, then this pointer works for both as a base pointer and derived pointer?
Another way to put it is, the memory layout for Derived is like
vptr <-- this
bob
alex
That is why it can use alex in b->test(), right?
Inside of Derived's methods, the implicit this pointer is always a Derived* pointer (more generically, the this pointer always matches the class type being called). That is why Derived::test() and Derived::get() can access the Derived::alex member. That has nothing to do with Base.
The memory layout of a Derived object begins with the data members of Base, followed by optional padding, followed by the data members of Derived. That allows you to use a Derived object wherever a Base object is expected. When you pass a Derived* pointer to a Base* pointer, or a Derived& reference to a Base& reference, the compiler will adjust the pointer/reference accordingly at compile-time to point at the Base portion of the Derived object.
When you call b->test() at runtime, where b is a Base* pointer, the compiler knows test() is virtual and will generate code that accesses the appropriate slot in b's vtable and call the method being pointed at. But, the compiler doesn't know what derived object type b is actually pointing at in runtime (that is the whole magic of polymorphism), so it can't automatically adjust the implicit this pointer to the correct derived pointer type at compile-time.
In the case where b is pointing at a Derived object, b's vtable is pointing at Derived's vtable. The compiler knows the exact offset of the start of Derived from the start of Base. So, the slot for test() in Derived's vtable will point to a private stub generated by the compiler to adjust the implicit Base *this pointer into a Derived *this pointer before then jumping into the actual implementation code for Derived::test().
Behind the scenes, it is roughly (not exactly) implemented like the following pseudo-code:
void Derived_test_stub(Base *this)
{
Derived *adjusted_this = reinterpret_cast<Derived*>(reinterpret_cast<uintptr_t>(this) + offset_from_Base_to_Derived);
Derived::test(adjusted_this);
}
int Derived_get_stub(Base *this)
{
Derived *adjusted_this = reinterpret_cast<Derived*>(reinterpret_cast<uintptr_t>(this) + offset_from_Base_to_Derived);
return Derived::get(adjusted_this);
}
struct vtable_Base
{
void* funcs[2] = {&Base::test, &Base::get};
};
struct vtable_Derived
{
void* funcs[2] = {&Derived_test_stub, &Derived_get_stub};
};
Base::Base()
{
this->vtable = &vtable_Base;
bob = 0;
}
Derived::Derived() : Base()
{
Base::vtable = &vtable_Derived;
this->vtable = &vtable_Derived;
alex = 0;
}
...
Base *b = new Derived;
//b->test(); // calls Derived::test()...
typedef void (*test_type)(Base*);
static_cast<test_type>(b->vtable[0])(b); // calls Derived_test_stub()...
//int i = b->get(); // calls Derived::get()...
typedef int (*get_type)(Base*);
int i = static_cast<get_type>(b->vtable[1])(b); // calls Derived_get_stub()...
The actual details are a bit more involved, but that should give you a basic idea of how polymorphism is able to dispatch virtual methods at runtime.
What you've shown is reasonably accurate, at least for a typical implementation. It's not guaranteed to be precisely as you've shown it (e.g., the compiler might easily insert some padding between bob and alex, but either way it "knows" that alex is at some predefined offset from this, so it can take a pointer to Base, calculate the correct offset from it, and use what's there.
Not what you asked about, so I won't try to get into detail, but just a fair warning: computing such offsets can/does get a bit more complex when/if multiple inheritance gets involved. Not so much for accessing a member of the most derived class, but if you access a member of a base class, it has to basically compute an offset to the beginning of that base class, then add an offset to get to the correct offset within that base class.
A derived class is not a seperate class but an extension. If something is allocated as derived then a pointer (which is just an address in memory) will be able to find everything from the derived class. Classes don't exist in assembly, the compiler keeps track of everything according to how it is allocated in memory and provides appropriate checking accordingly.

What is the underlying mechanism of a base class pointer assigned to derived class

Similar questions I found were more based on what this does; I understand the assignment of a base class pointer to a derived class, e.g Base* obj = new Derived() to be that the right side gets upcasted to a Base* type, but I would like to understand the mechanism for how this happens and how it allows for virtual to access derived class methods. From searching online, someone equated the above code to Base* obj = new (Base*)Derived, which is what led to this confusion. If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions (the functions of the Derived class)? Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj? Thanks, and apologies for the simplicity of the question. I'd like to understand what causes this behavior.
Note: for clarity, in my example, Derived publicly inherits from Base.
In a strict sense, the answer to "how does inheritance work at runtime?" is "however the compiler-writer designed it". I.e., the language specification only describes the behavior to be achieved, not the mechanism to achieve it.
In that light, the following should be seen as analogy. Compilers will do something analogous to the following:
Given a class Base:
class Base
{
int a;
int b;
public:
Base()
: a(5),
b(3)
{ }
virtual void foo() {}
virtual void bar() {}
};
The compiler will define two structures: one we'll call the "storage layout" -- this defines the relative locations of member variables and other book-keeping info for an object of the class; the second structure is the "virtual dispatch table" (or vtable). This is a structure of pointers to the implementations of the virtual methods for the class.
This figure gives an object of type Base
Now lets look as the equivalent structure for a derived class, Derived:
class Derived : public Base
{
int c;
public:
Derived()
: Base(),
c(4)
{ }
virtual void bar() //Override
{
c = a*5 + b*3;
}
};
For an object of type Derived, we have a similar structure:
The important observation is that the in-memory representation of both the member-variable storage and the vtable entries, for members a and b, and methods foo and bar, are identical between the base class and subclass. So a pointer of type Base * that happens to point to an object of type Derived will still implement an access to the variable a as a reference to the first storage offset after the vtable pointer. Likewise, calling ptr->bar() passes control to the method in the second slot of the vtable. If the object is of type Base, this is Base::bar(); if the object is of type Derived, this is Derived::bar().
In this analogy, the this pointer points to the member storage block. Hence, the implementation of Derived::bar() can access the member variable c by fetching the 3rd storage slot after the vtable pointer, relative to this. Note that this storage slot exists whenever Derived::bar() sits in the second vtable slot...i.e., when the object really is of type Derived.
A brief aside on the debugging insanity that can arise from corrupting the vtable pointer for compilers that use a literal vtable pointer at offset 0 from this:
#include <iostream>
class A
{
public:
virtual void foo()
{
std::cout << "A::foo()" << std::endl;
}
};
class B
{
public:
virtual void bar()
{
std::cout << "B::bar()" << std::endl;
}
};
int main(int argc, char *argv[])
{
A *a = new A();
B *b = new B();
std::cout << "A: ";
a->foo();
std::cout << "B: ";
b->bar();
//Frankenobject
*((void **)a) = *((void **)b); //Overwrite a's vtable ptr with b's.
std::cout << "Franken-AB: ";
a->foo();
}
Yields:
$ ./a.out
A: A::foo()
B: B::bar()
Franken-AB: B::bar()
$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
...note the lack of an inheritance relationship between A and B... :scream:
Whoever says
Base* obj = new Derived();
is equivalent to
Base* obj = new (Base*)Derived;
is ignorant of the subject matter.
It's more like:
Derived* temp = new Derived;
Base* obj = temp;
The explicit cast is not necessary. The language permits a derived class pointer to be assigned to a base class pointer.
Most of the time the numerical value of the two pointers are same but they are not same when multiple inheritance or virtual inheritance is involved.
It's the compiler's responsibility to make sure that numerical value of the pointer is offset properly when converting a derived class pointer to a base class pointer. The compiler is able to do that since it makes the decision about the layout of the derived class and the base class sub-objects in the derived class object.
If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions
There is no type casting. There is a type conversion. Regarding the virtual functions, please see How are virtual functions and vtable implemented?.
Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj?
This is moot since it does not happen the way you thought they did.

Overriding operator new/delete in derived class

I have a stateless, abstract base class from which various concrete classes inherit. Some of these derived classes are stateless as well. Because many of them are created during a run, I'd like to save memory and overhead by having all stateless derived classes emulate a singleton, by overriding operator new()/delete(). A simplified example would look something like this:
#include <memory>
struct Base {
virtual ~Base() {}
protected:
Base() {} // prevent concrete Base objects
};
struct D1 : public Base { // stateful object--default behavior
int dummy;
};
struct D2 : public Base { // stateless object--don't allocate memory
void* operator new(size_t size)
{
static D2 d2;
return &d2;
}
void operator delete(void *p) {}
};
int main() {
Base* p1 = new D1();
Base* p2 = new D1();
Base* s1 = new D2();
Base* s2 = new D2();
delete p1;
delete p2;
delete s1;
delete s2;
return 0;
}
This example doesn't work: delete s2; fails because delete s1; called ~Base(), which deallocated the shared Base in d2. This can be addressed by adding the same trick with new/delete overloading to Base. But I'm not sure this is the cleanest solution, or even a correct one (valgrind doesn't complain, FWIW). I'd appreciate advice or critique.
edit: actually, the situation is worse. The Base class in this example isn't abstract, as I claimed. If it's made abstract, through the addition of a pure virtual method, then I can no longer apply the new/delete overriding trick, because I cannot have a static variable of type Base. So I don't have any solution for this problem!
You just can't do that - that would violate "object identity" requirement that states that each object must have its own address. You have to allocate distinct memory block to each object - this can be done rather fast if you override operator new to use a fast block allocator specifically tailored for objects of fixed size.
I would say the best solution here is to make your derived class an actual singleton. Make your derived constructor private and just provide a static Base* getInstance() method that either creates the required object or returns the static instance. This way the only way to get a D1 object would be via this method since calling new D1 would be illegal.

call a childs version of a function instead of a parents?

Okay, so I got two classes.
class a{
public:
a(){};
void print(){cout << "hello"};
}
class b : public a{
public:
void print(){cout << "hello world";}
}
And a array of parents with a child
a blah[10];
blah[5] = b();
Than I call print, and want it to say hello world.
blah[5].print();
But it calls the parent. How do I fix this?
This can be fixed by declaring the function virtual, a la:
class a{
public:
virtual void print(){
cout << "hello";
}
}
class b : public a{
public:
virtual void print() {
cout << "hello world";
}
}
This is how one implements polymorphism in C++. More here: http://en.wikipedia.org/wiki/Virtual_function
However, it should be noted that in your example, it will never call the child function, because you are using object values, not pointers/references to objects. To remedy this,
a * blah[10];
blah[5] = new b();
Then:
blah[5]->print();
What you're looking for is run-time polymorphism, which means to have the object take "many forms" (i.e. a or b), and act accordingly, as the program runs. In C++, you do this by making the function virtual in the base class a:
virtual void print() {cout << "hello"};
Then, you need to store the elements by pointer or reference, and - as in general the derived classes can introduce new data members and need more storage - it's normal to create the objects on the heap with new:
a* blah[10];
blah[5] = new b();
Then you can call:
blah[5]->print();
And it will call the b implementation of print().
You should later delete blah[5] (and any other's you've pointed at memory returned by new).
In practice, it's a good idea to use a container that can delete the objects it contains when it is itself destructed, whether due to leaving scope or being deleted. std::vector<> is one such container. You can also use smart-pointers to automate the deletion of the a and b objects. This helps make the code correct if exceptions are thrown before your delete statements execute, and you want your program to keep running without leaking memory. The boost library is the easiest/best place to get a smart pointer implementation from. Together:
#include <vector>
#include <boost/shared_ptr.hpp>
std::vector<boost::shared_pointer<a> > blah(10);
blah[5] = new b();
(It's more normal to use vectors with push_back(), as it automatically grows the vector to fit in all the elements you've added, with the new total available by calling vector::size().)
It does that because you told the compiler that your instance was of type a. It's in an array of a objects, right? So it's of type a!
Of course, you want the method in b to overwrite the one in a, despite having a reference of the parent type. You can get that behaviour using the virutal keyword on function declaration in the parent class.
virtual void print(){cout << "hello"};
Why does it work like that?
Because when you cast your object to the parent class, you introduced an ambiguity. When this object's print() is called, how should we treat it? It is of type b, but the reference is of type a, so the sorrounding code may expect it to behave like a, not b!
To disambiguate, the virtual keyword is introduced. virtual functions are always overridden, if the object is of a child class containing a method with the same signature.
Cheers!
Declare a::print() as virtual and use pointer/reference to call the print() function. When you do blah[5] = b(), it does object slicing. You don't have any effect calling a virtual function using object.

Trouble understanding C++ `virtual`

I'm having trouble understanding what the purpose of the virtual keyword in C++. I know C and Java very well but I'm new to C++
From wikipedia
In object-oriented programming, a
virtual function or virtual method is
a function or method whose behavior
can be overridden within an inheriting
class by a function with the same
signature.
However I can override a method as seen below without using the virtual keyword
#include <iostream>
using namespace std;
class A {
public:
int a();
};
int A::a() {
return 1;
}
class B : A {
public:
int a();
};
int B::a() {
return 2;
}
int main() {
B b;
cout << b.a() << endl;
return 0;
}
//output: 2
As you can see below, the function A::a is successfully overridden with B::a without requiring virtual
Compounding my confusion is this statement about virtual destructors, also from wikipedia
as illustrated in the following example,
it is important for a C++ base class
to have a virtual destructor to ensure
that the destructor from the most
derived class will always be called.
So virtual also tells the compiler to call up the parent's destructors? This seems to be very different from my original understanding of virtual as "make the function overridable"
Make the following changes and you will see why:
#include <iostream>
using namespace std;
class A {
public:
int a();
};
int A::a() {
return 1;
}
class B : public A { // Notice public added here
public:
int a();
};
int B::a() {
return 2;
}
int main() {
A* b = new B(); // Notice we are using a base class pointer here
cout << b->a() << endl; // This will print 1 instead of 2
delete b; // Added delete to free b
return 0;
}
Now, to make it work like you intended:
#include <iostream>
using namespace std;
class A {
public:
virtual int a(); // Notice virtual added here
};
int A::a() {
return 1;
}
class B : public A { // Notice public added here
public:
virtual int a(); // Notice virtual added here, but not necessary in C++
};
int B::a() {
return 2;
}
int main() {
A* b = new B(); // Notice we are using a base class pointer here
cout << b->a() << endl; // This will print 2 as intended
delete b; // Added delete to free b
return 0;
}
The note that you've included about virtual destructors is exactly right. In your sample there is nothing that needs to be cleaned-up, but say that both A and B had destructors. If they aren't marked virtual, which one is going to get called with the base class pointer? Hint: It will work exactly the same as the a() method did when it was not marked virtual.
You could think of it as follows.
All functions in Java are virtual. If you have a class with a function, and you override that function in a derived class, it will be called, no matter the declared type of the variable you use to call it.
In C++, on the other hand, it won't necessarily be called.
If you have a base class Base and a derived class Derived, and they both have a non-virtual function in them named 'foo', then
Base * base;
Derived *derived;
base->foo(); // calls Base::foo
derived->foo(); // calls Derived::foo
If foo is virtual, then both call Derived::foo.
virtual means that the actual method is determined runtime based on what class was instantiated not what type you used to declare your variable.
In your case this is a static override it will go for the method defined for class B no matter what was the actual type of the object created
So virtual also tells the compiler to call up the parent's destructors? This seems to be very different from my original understanding of virtual as "make the function overridable"
Your original and your new understanding are both wrong.
Methods (you call them functions) are always overridable. No matter if virtual, pure, nonvirtual or something.
Parent destructors are always called. As are the constructors.
"Virtual" does only make a difference if you call a method trough a pointer of type pointer-to-baseclass. Since in your example you don't use pointers at all, virtual doesn't make a difference at all.
If you use a variable a of type pointer-to-A, that is A* a;, you can not only assign other variables of type pointer-to-A to it, but also variables of type pointer-to-B, because B is derived from A.
A* a;
B* b;
b = new B(); // create a object of type B.
a = b; // this is valid code. a has still the type pointer-to-A,
// but the value it holds is b, a pointer to a B object.
a.a(); // now here is the difference. If a() is non-virtual, A::a()
// will be called, because a is of type pointer-to-A.
// Whether the object it points to is of type A, B or
// something entirely different doesn't matter, what gets called
// is determined during compile time from the type of a.
a.a(); // now if a() is virtual, B::a() will be called, the compiler
// looks during runtime at the value of a, sees that it points
// to a B object and uses B::a(). What gets called is determined
// from the type of the __value__ of a.
As you can see below, the function A::a is successfully overridden with B::a without requiring virtual
It may, or it may not work. In your example it works, but it's because you create and use an B object directly, and not through pointer to A. See C++ FAQ Lite, 20.3.
So virtual also tells the compiler to call up the parent's destructors?
A virtual destructor is needed if you delete a pointer of base class pointing to an object of derived class, and expect both base and derived destructors to run. See C++ FAQ Lite, 20.7.
You need the virtual if you use a base class pointer as consultutah (and others while I'm typing ;) ) says it.
The lack of virtuals allows to save a check to know wich method it need to call (the one of the base class or of some derived). However, at this point don't worry about performances, just on correct behaviour.
The virtual destructor is particulary important because derived classes might declare other variables on the heap (i.e. using the keyword 'new') and you need to be able to delete it.
However, you might notice, that in C++, you tend to use less deriving than in java for example (you often use templates for a similar use), and maybe you don't even need to bother about that. Also, if you never declare your objects on the heap ("A a;" instead of "A * a = new A();") then you don't need to worry about it either. Of course, this will heavily depend on what/how you develop and if you plan that someone else will derive your class or not.
Try ((A*)&b).a() and see what gets called then.
The virtual keyword lets you treat an object in an abstract way (I.E. through a base class pointer) and yet still call descendant code...
Put another way, the virtual keyword "lets old code call new code". You may have written code to operate on A's, but through virtual functions, that code can call B's newer a().
Say you instantiated B but held it as an instance of an A:
A *a = new B();
and called function a() whose implementation of a() will be called?
If a() isn't virtual A's will be called. If a() was virtual the instantiated sub class version of a() would be called regardless of how you're holding it.
If B's constructor allocated tons of memory for arrays or opened files, calling
delete a;
would ensure B's destructor was called regardless as to how it was being held, be it by a base class or interface or whatever.
Good question by the way.
I always think about it like chess pieces (my first experiment with OO).
A chessboard holds pointers to all the pieces. Empty squares are NULL pointers. But all it knows is that each pointer points a a chess piece. The board does not need to know more information. But when a piece is moved the board does not know it is a valid move as each pice has different characteristica about how it moves. So the board needs to check with the piece if the move is valid.
Piece* board[8][8];
CheckMove(Point const& from,Point const& too)
{
Piece* piece = board[from.x][from.y];
if (piece != NULL)
{
if (!piece->checkValidMove(from,too))
{ throw std::exception("Bad Move");
}
// Other checks.
}
}
class Piece
{
virtual bool checkValidMove(Point const& from,Point const& too) = 0;
};
class Queen: public Piece
{
virtual bool checkValidMove(Point const& from,Point const& too)
{
if (CheckHorizontalMove(from,too) || CheckVerticalMoce(from,too) || CheckDiagonalMove(from,too))
{
.....
}
}
}