CRTP - Dangerous Memory Access? - c++

I created the following test code to experiment with the Curiously Recurring Template Pattern, in which I have a Base class with an Interface() and a Derived class with an Implementation. It's modeled directly after the simplest example from the linked Wikipedia page.
#include <iostream>
template <class T>
class Base {
public:
void Interface() {
std::cout << "Base Interface()" << std::endl;
static_cast<T*>(this)->Implementation();
}
};
class Derived : public Base<Derived> {
public:
Derived(int data = 0) : data_(data) {}
void Implementation() {
std::cout << "Derived Implementation(), data_ = " << data_ << std::endl;
}
private:
int data_;
};
int main() {
Base<Derived> b;
b.Interface();
std::cout << std::endl;
Derived d;
d.Interface();
std::cout << std::endl;
return 0;
}
Once compiled, the program runs smoothly and produces the following output:
Base Interface()
Derived Implementation(), data_ = -1450622976
Base Interface()
Derived Implementation(), data_ = 0
The interesting part is the first test, in which a Base pointer is being cast to a Derived pointer, from which the function Implementation() is being called. Inside this function, a "member variable" data_ is being accessed.
To us this is nonsense, but to a compiler it is simply the value at some offset from this object's memory location. However, in this case the Derived class takes more space than the base class, and so if the compiler thinks it's accessing a data member from a Derived object, but actually the object is a Base object, then the memory we are accessing may not belong to this object, or even to this program.
It seems that this programming practice allows us (the programmer) to very easily do very dangerous things, such as making a seemingly reasonable function call that ends up reading from an uncontrolled memory location. Have I interpreted the mechanics of this example correctly? If so, are there techniques within the CRTP paradigm that I've missed to ensure this problem will not show up?

In general static_cast works properly if the pointer you pass there points to an object which contains the target class somewhere in its inheritance hierarchy.
For
Base b;
b.Interface();
a pointer to a real Base object is passed to static_cast, and it is not related to Derived class at all. So after casting you have a pointer that looks like a pointer to Derived, but it still points to a Base object in memory. When you access the data_ member via this pointer, you get a content of some uninitialized area in memory.

I only recently came to the same realization as you did.
After a bit of searching, my answer is yes, you did interpreted the CRTP correctly, and not, you did not miss techniques to avoid the issue.
Which we both find surprising.
Even more surprising for me, was the fact that the code compiles. It is a static downcasting of a base-class pointer, where it is known at compile time that the actual instance is of the base class.
But apparently these casts are allowed, they just lead to undefined behaviors in cases like this one (e.g. see here )
As suggested in the comments by #NathanOliver, a solution to avoid the risk is to make the constructor of the base class protected.

Related

What is the underlying mechanism of a base class pointer assigned to derived class

Similar questions I found were more based on what this does; I understand the assignment of a base class pointer to a derived class, e.g Base* obj = new Derived() to be that the right side gets upcasted to a Base* type, but I would like to understand the mechanism for how this happens and how it allows for virtual to access derived class methods. From searching online, someone equated the above code to Base* obj = new (Base*)Derived, which is what led to this confusion. If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions (the functions of the Derived class)? Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj? Thanks, and apologies for the simplicity of the question. I'd like to understand what causes this behavior.
Note: for clarity, in my example, Derived publicly inherits from Base.
In a strict sense, the answer to "how does inheritance work at runtime?" is "however the compiler-writer designed it". I.e., the language specification only describes the behavior to be achieved, not the mechanism to achieve it.
In that light, the following should be seen as analogy. Compilers will do something analogous to the following:
Given a class Base:
class Base
{
int a;
int b;
public:
Base()
: a(5),
b(3)
{ }
virtual void foo() {}
virtual void bar() {}
};
The compiler will define two structures: one we'll call the "storage layout" -- this defines the relative locations of member variables and other book-keeping info for an object of the class; the second structure is the "virtual dispatch table" (or vtable). This is a structure of pointers to the implementations of the virtual methods for the class.
This figure gives an object of type Base
Now lets look as the equivalent structure for a derived class, Derived:
class Derived : public Base
{
int c;
public:
Derived()
: Base(),
c(4)
{ }
virtual void bar() //Override
{
c = a*5 + b*3;
}
};
For an object of type Derived, we have a similar structure:
The important observation is that the in-memory representation of both the member-variable storage and the vtable entries, for members a and b, and methods foo and bar, are identical between the base class and subclass. So a pointer of type Base * that happens to point to an object of type Derived will still implement an access to the variable a as a reference to the first storage offset after the vtable pointer. Likewise, calling ptr->bar() passes control to the method in the second slot of the vtable. If the object is of type Base, this is Base::bar(); if the object is of type Derived, this is Derived::bar().
In this analogy, the this pointer points to the member storage block. Hence, the implementation of Derived::bar() can access the member variable c by fetching the 3rd storage slot after the vtable pointer, relative to this. Note that this storage slot exists whenever Derived::bar() sits in the second vtable slot...i.e., when the object really is of type Derived.
A brief aside on the debugging insanity that can arise from corrupting the vtable pointer for compilers that use a literal vtable pointer at offset 0 from this:
#include <iostream>
class A
{
public:
virtual void foo()
{
std::cout << "A::foo()" << std::endl;
}
};
class B
{
public:
virtual void bar()
{
std::cout << "B::bar()" << std::endl;
}
};
int main(int argc, char *argv[])
{
A *a = new A();
B *b = new B();
std::cout << "A: ";
a->foo();
std::cout << "B: ";
b->bar();
//Frankenobject
*((void **)a) = *((void **)b); //Overwrite a's vtable ptr with b's.
std::cout << "Franken-AB: ";
a->foo();
}
Yields:
$ ./a.out
A: A::foo()
B: B::bar()
Franken-AB: B::bar()
$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
...note the lack of an inheritance relationship between A and B... :scream:
Whoever says
Base* obj = new Derived();
is equivalent to
Base* obj = new (Base*)Derived;
is ignorant of the subject matter.
It's more like:
Derived* temp = new Derived;
Base* obj = temp;
The explicit cast is not necessary. The language permits a derived class pointer to be assigned to a base class pointer.
Most of the time the numerical value of the two pointers are same but they are not same when multiple inheritance or virtual inheritance is involved.
It's the compiler's responsibility to make sure that numerical value of the pointer is offset properly when converting a derived class pointer to a base class pointer. The compiler is able to do that since it makes the decision about the layout of the derived class and the base class sub-objects in the derived class object.
If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions
There is no type casting. There is a type conversion. Regarding the virtual functions, please see How are virtual functions and vtable implemented?.
Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj?
This is moot since it does not happen the way you thought they did.

Using members of base class after destruction of derived class

Suppose that we have a simple struct:
struct RefCounters {
size_t strong_cnt;
size_t weak_cnt;
RefCounters() : strong_cnt(0), weak_cnt(0) {}
};
From implementation point, the destructor RefCounters::~RefCounters should do nothing, since all its members have primitive type. This means that if an object of this type is destroyed with explicit call of destructor (but its memory is not deallocated), then we would be able to work with its members normally after the object is dead.
Now suppose that we have some more classes derived from RefCounters. Suppose that RefCounters is present exactly once among base classes of Derived class. Suppose that destructor is called explicitly for an object of class Derived, but its memory is not deallocated. Is it OK to access members strong_cnt and weak_cnt after that?
From implementation point, it should be OK, at least when there is no virtual inheritance involved. Because Derived* can be statically cast to RefCounters* (adding compile-time constant offset to address), and the memory of RefCounters should not be touched by destructor of Derived class.
Here is a code sample:
struct RefCounted : public RefCounters {
virtual ~RefCounted() {}
};
struct Base : public RefCounted {
int val1;
virtual void print();
};
struct Derived : public Base {
std::string val2;
virtual void print();
};
Derived *pDer = new Derived();
pDer->~Derived(); //destroy object
pDer->strong_cnt++; //modify its member
std::cout << pDer->strong_cnt << pDer->weak_cnt << "\n";
Is such code considered undefined behavior by C++ standard? Is there any practical reason why it can fail to work? Can it be made legal by minor changes or adding some constraints?
P.S. Supposedly, such code sample allows to make intrusive_ptr + weak_ptr combo, such that weak_ptr can be always obtained from an object pointer if at least one weak_ptr is still pointing at it. More details in this question.
I believe that your approach is bad. There is a nice link in comments that shows debate about the details of the standard. Once there is a debate there is good chance that different compilers will implement this detail differently. Even more. The same compiler may change its implementation from one version to another.
The more you use various dark corners, the bigger is the chance that you will meet with problems.
Bottom line. What are willing to achieve? Why can't you do this using ordinary C++ language features?

Is it possible to call derived object's virtual method when down-casted from base class?

Given the following class structure:
class Base
{
virtual void outputMessage() { cout << "Base message!"; }
};
class Derived : public Base
{
virtual void outputMessage() { cout << "Derived message!"; }
}
.. and this code snippet:
Base baseObj;
Derived* convertedObj = (Derived*) &baseObj;
convertedObj->outputMessage();
.. the output will be "Base message!".
Is there any way to cast or manipulate the object to make Derived's version of the outputMessage method to be called polymorphically?
Edit: I will attempt to show the reason why I'm after this:
I am writing migration tools that hook into our main system. For this reason, I need to get access to protected member methods, or customise existing virtual methods. The former I can do by defining a derived class and casting objects to it, to call methods statically. What I can't do is change the behaviour for methods which I do not call statically (ie methods that are called elsewhere in the codebase).
I have also tried creating objects of the derived class directly, but this causes issues in other parts of the system due to the manipulation of the objects passed through the constructor.
No, virtual functions operate on the actual types of the object being pointed to, which in your case is just a simple Base.
Actually, with the down-casting, you're entering undefined-behaviour land here. This can blow off like a bomb with multiple inheritance, where the vtable in the derived class isn't at the same offset as the vtable in the base class.
No Standard-compliant solution
What you're trying to do isn't possible using behaviours guaranteed by the C++ Standard.
If you really MUST do this as a short-term measure to assist your migration, don't depend on it in production, and can adequately verify the behaviour, you could experiment as illustrated below.
Discussion of your attempt
What I'm showing is that you're taking the wrong approach: simply casting a pointer-to-base to a pointer-to-derived doesn't alter the object's vtable pointer.
Deriving a plausible hack
Addressing that, the naive approach is to reconstruct the object in place as a derived object ("placement" new), but this doesn't work either - it will reinitialise the base class members.
What you can possibly do is create a non-derived object that has no data members but the same virtual dispatch table entries (i.e. same virtual functions, same accessibility private/protected/public, same order).
More warnings and caveats
It may work (as it does on my Linux box), but use it at your own risk (I suggest not on production systems).
Further warning: this can only intercept virtual dispatch, and virtual functions can sometimes be dispatched statically when the compiler knows the types at compile time.
~/dev cat hack_vtable.cc
// change vtable of existing object to intercept virtual dispatch...
#include <iostream>
struct B
{
virtual void f() { std::cout << "B::f()\n"; }
std::string s_;
};
struct D : B
{
virtual void f() { std::cout << "D::f()\n"; }
};
struct E
{
virtual void f() { std::cout << "E::f()\n"; }
};
int main()
{
B* p = new B();
p->s_ = "hello";
new (p) D(); // WARNING: reconstructs B members
p->f();
std::cout << '\'' << p->s_ << "'\n"; // no longer "hello"
p->s_ = "world";
new (p) E();
p->f(); // correctly calls E::f()
std::cout << '\'' << p->s_ << "'\n"; // still "world"
}
~/dev try hack_vtable
make: `hack_vtable' is up to date.
D::f()
''
E::f()
'world'
Well, even if you're casting your Base object as a Derived one, internally, it's still a Base object: the vftable of your object (the actual map of functions to RAM pointers) is not updated.
I don't think there is any way to do what you want to do and I don't understand why you'd like to do it.
In this question downcast problem in c++ Robs answer should also be the answer to your problem.
Not at least in legal way. To call Derived class function, you need to have Derived object being referred.

how C++ ensures the concept of base class and derived class

I would like to know how c++ ensures the concept layout in memory of these classes to support inheritance.
for example:
class Base1
{
public:
void function1(){cout<<"Base1"};
};
class Base2
{
public:
void function2(){cout<<"Base2"};
};
class MDerived: Base1,Base2
{
public:
void function1(){cout<<"MDerived"};
};
void function(Base1 *b1)
{
b1->function1();
}
So when I pass function an object of derived type the function should offset into the base1 class function and call it. How does C++ ensure such a layout.
When a MDerived* needs to be converted to a Base1*, the compiler adjusts the pointer to point to the correct memory address, where the members of this base class are located. This means that a MDerived* that is cast to a Base1* might point to a different memory address than the original MDerived* (depending on the memory layout of the derived class).
The compiler can do this because it knows the memory layout of all the classes, and when a cast occurs it can add code that adjusts the address of the pointer.
For example this might print different addresses:
int main() {
MDerived *d = new MDerived;
std::cout << "derived: " << d << std::endl;
std::cout << "base1: " << (base1*)d << std::endl;
std::cout << "base2: " << (base2*)d << std::endl;
}
In your example such adjustments might not be necessary since the classes don't contain any member variables that would use any memory in the sub-objects representing the base classes. If you have a pointer pointing to "nothing" (no member variables), it doesn't really matter if that nothing is called Base1 or Base2 or MDerived.
The non-virtual methods of the classes are not stored with each object, they are stored only once. The compiler then statically, at compile time, uses those global addresses when a member function is called, according to the type of the variable used.
The layout of a class in memory includes its members and base class subobjects (ยง10/2). Members are also subobjects. A pointer to a base subobject is a pointer to an object (but not a most-derived object).
When you convert an MDerived * to a Base2 *, the compiler looks up the offset of the Base2 object inside the MDerived object and uses it to generate this for the inherited method.
I think you're asking why, when you call b1->function(), does Base1::function1() fire?
If so, then the reason is because b1 is a Base1 pointer, not a MDerived pointer. The object it points to may in fact "be" an MDerived object, but function(Base1*) has no way of knowing this, so it calls the only thing it does know -- Base1::function1().
Now, if you had marked the base class function as virtual, things change:
#include <iostream>
#include <string>
using namespace std;
class Base1
{
public: virtual void function1() { cout<<"Base1"; }
};
class Base2
{
public: void function2(){cout<<"Base2";}
};
class MDerived: public Base1, public Base2
{
public: void function1(){cout<<"MDerived";}
};
void function(Base1 *b1)
{
b1->function1();
}
int main()
{
MDerived d;
function(&d);
}
The output of the program is:
"MDerived"
void function(Base1 *b1) still doesn't know that the object being pointed to is actually an MDerived, but now it doesn't have to. WHen you call virtual functions through a base class pointer, you get polymorphic behavior. Which in this case means MDerived::function1() is called because that is the most-derived type available.
Few things in your code
The multiple inheritance should be public, else the compiler will complain that there is no access to the base classes
No ; at the end of the cout statements
Did not include the <iostream> (ok may be I am being too pedantic)
To answer your question - the compiler knows that it is the base class because the type of the argument that the function is taking is Base1. The compiler converts the type of the passed data (assuming you passed a derived object, does object slicing) and then calls the function1() on it (which is a simple offset from the base pointer calculation).

Would using a virtual destructor make non-virtual functions do v-table lookups?

Just what the topic asks. Also want to know why non of the usual examples of CRTP do not mention a virtual dtor.
EDIT:
Guys, Please post about the CRTP prob as well, thanks.
Only virtual functions require dynamic dispatch (and hence vtable lookups) and not even in all cases. If the compiler is able to determine at compile time what is the final overrider for a method call, it can elide performing the dispatch at runtime. User code can also disable the dynamic dispatch if it so desires:
struct base {
virtual void foo() const { std::cout << "base" << std::endl; }
void bar() const { std::cout << "bar" << std::endl; }
};
struct derived : base {
virtual void foo() const { std::cout << "derived" << std::endl; }
};
void test( base const & b ) {
b.foo(); // requires runtime dispatch, the type of the referred
// object is unknown at compile time.
b.base::foo();// runtime dispatch manually disabled: output will be "base"
b.bar(); // non-virtual, no runtime dispatch
}
int main() {
derived d;
d.foo(); // the type of the object is known, the compiler can substitute
// the call with d.derived::foo()
test( d );
}
On whether you should provide virtual destructors in all cases of inheritance, the answer is no, not necessarily. The virtual destructor is required only if code deletes objects of the derived type held through pointers to the base type. The common rule is that you should
provide a public virtual destructor or a protected non-virtual destructor
The second part of the rule ensures that user code cannot delete your object through a pointer to the base, and this implies that the destructor need not be virtual. The advantage is that if your class does not contain any virtual method, this will not change any of the properties of your class --the memory layout of the class changes when the first virtual method is added-- and you will save the vtable pointer in each instance. From the two reasons, the first being the important one.
struct base1 {};
struct base2 {
virtual ~base2() {}
};
struct base3 {
protected:
~base3() {}
};
typedef base1 base;
struct derived : base { int x; };
struct other { int y; };
int main() {
std::auto_ptr<derived> d( new derived() ); // ok: deleting at the right level
std::auto_ptr<base> b( new derived() ); // error: deleting through a base
// pointer with non-virtual destructor
}
The problem in the last line of main can be resolved in two different ways. If the typedef is changed to base1 then the destructor will correctly be dispatched to the derived object and the code will not cause undefined behavior. The cost is that derived now requires a virtual table and each instance requires a pointer. More importantly, derived is no longer layout compatible with other. The other solution is changing the typedef to base3, in which case the problem is solved by having the compiler yell at that line. The shortcoming is that you cannot delete through pointers to base, the advantage is that the compiler can statically ensure that there will be no undefined behavior.
In the particular case of the CRTP pattern (excuse the redundant pattern), most authors do not even care to make the destructor protected, as the intention is not to hold objects of the derived type by references to the base (templated) type. To be in the safe side, they should mark the destructor as protected, but that is rarely an issue.
Very unlikely indeed. There's nothing in the standard to stop compilers doing whole classes of stupidly inefficient things, but a non-virtual call is still a non-virtual call, regardless of whether the class has virtual functions too. It has to call the version of the function corresponding to the static type, not the dynamic type:
struct Foo {
void foo() { std::cout << "Foo\n"; }
virtual void virtfoo() { std::cout << "Foo\n"; }
};
struct Bar : public Foo {
void foo() { std::cout << "Bar\n"; }
void virtfoo() { std::cout << "Bar\n"; }
};
int main() {
Bar b;
Foo *pf = &b; // static type of *pf is Foo, dynamic type is Bar
pf->foo(); // MUST print "Foo"
pf->virtfoo(); // MUST print "Bar"
}
So there's absolutely no need for the implementation to put non-virtual functions in the vtable, and indeed in the vtable for Bar you'd need two different slots in this example for Foo::foo() and Bar::foo(). That means it would be a special-case use of the vtable even if the implementation wanted to do it. In practice it doesn't want to do it, it wouldn't make sense to do it, don't worry about it.
CRTP base classes really ought to have destructors that are non-virtual and protected.
A virtual destructor is required if the user of the class might take a pointer to the object, cast it to the base class pointer type, then delete it. A virtual destructor means this will work. A protected destructor in the base class stops them trying it (the delete won't compile since there's no accessible destructor). So either one of virtual or protected solves the problem of the user accidentally provoking undefined behavior.
See guideline #4 here, and note that "recently" in this article means nearly 10 years ago:
http://www.gotw.ca/publications/mill18.htm
No user will create a Base<Derived> object of their own, that isn't a Derived object, since that's not what the CRTP base class is for. They just don't need to be able to access the destructor - so you can leave it out of the public interface, or to save a line of code you can leave it public and rely on the user not doing something silly.
The reason it's undesirable for it to be virtual, given that it doesn't need to be, is just that there's no point giving a class virtual functions if it doesn't need them. Some day it might cost something, in terms of object size, code complexity or even (unlikely) speed, so it's a premature pessimization to make things virtual always. The preferred approach among the kind of C++ programmer who uses CRTP, is to be absolutely clear what classes are for, whether they are designed to be base classes at all, and if so whether they are designed to be used as polymorphic bases. CRTP base classes aren't.
The reason that the user has no business casting to the CRTP base class, even if it's public, is that it doesn't really provide a "better" interface. The CRTP base class depends on the derived class, so it's not as if you're switching to a more general interface if you cast Derived* to Base<Derived>*. No other class will ever have Base<Derived> as a base class, unless it also has Derived as a base class. It's just not useful as a polymorphic base, so don't make it one.
The answer to your first question: No. Only calls to virtual functions will cause an indirection via the virtual table at runtime.
The answer to your second question: The Curiously recurring template pattern is commonly implemented using private inheritance. You don't model an 'IS-A' relationship and hence you don't pass around pointers to the base class.
For instance, in
template <class Derived> class Base
{
};
class Derived : Base<Derived>
{
};
You don't have code which takes a Base<Derived>* and then goes on to call delete on it. So you never attempt to delete an object of a derived class through a pointer to the base class. Hence, the destructor doesn't need to be virtual.
Firstly, I think the answer to the OP's question has been answered quite well - that's a solid NO.
But, is it just me going insane or is something going seriously wrong in the community? I felt a bit scared to see so many people suggesting that it's useless/rare to hold a pointer/reference to Base. Some of the popular answers above suggest that we don't model IS-A relationship with CRTP, and I completely disagree with those opinions.
It's widely known that there's no such thing as interface in C++. So to write testable/mockable code, a lot of people use ABC as an "interface". For example, you have a function void MyFunc(Base* ptr) and you can use it this way: MyFunc(ptr_derived). This is the conventional way to model IS-A relationship which requires vtable lookups when you call any virtual functions in MyFunc. So this is pattern one to model IS-A relationship.
In some domain where performance is critical, there exists another way(pattern two) to model IS-A relationship in a testable/mockable manner - via CRTP. And really, performance boost can be impressive(600% in the article) in some cases, see this link. So MyFunc will look like this template<typename Derived> void MyFunc(Base<Derived> *ptr). When you use MyFunc, you do MyFunc(ptr_derived); The compiler is going to generate a copy of code for MyFunc() that matches best with the parameter type ptr_derived - MyFunc(Base<Derived> *ptr). Inside MyFunc, we may well assume some function defined by the interface is called, and pointers are statically cast-ed at compile time(check out the impl() function in the link), there's no overheads for vtable lookups.
Now, can someone please tell me either I am talking insane nonsense or the answers above simply did not consider the second pattern to model IS-A relationship with CRTP?