How is the deletion of a pointer detected using dynamic cast - c++

As shown here, one can use dynamic_cast to detect a deleted pointer:
#include <iostream>
using namespace std;
class A
{
public:
A() {}
virtual ~A() {}
};
class B : public A
{
public:
B() {}
};
int main()
{
B* pB = new B;
cout << "dynamic_cast<B*>( pB) ";
cout << ( dynamic_cast<B*>(pB) ? "worked" : "failed") << endl;
cout << "dynamic_cast<B*>( (A*)pB) ";
cout << ( dynamic_cast<B*>( (A*)pB) ? "worked" : "failed") << endl;
delete pB;
cout << "dynamic_cast<B*>( pB) ";
cout << ( dynamic_cast<B*>(pB) ? "worked" : "failed") << endl;
cout << "dynamic_cast<B*>( (A*)pB) ";
cout << ( dynamic_cast<B*>( (A*)pB) ? "worked" : "failed") << endl;
}
the output:
dynamic_cast<B*>( pB) worked
dynamic_cast<B*>( (A*)pB) worked
dynamic_cast<B*>( pB) worked
dynamic_cast<B*>( (A*)pB) failed
It explains that the deletion of the vtable is detected.
But I am wondering how is that possible since we do not overwrite the freed memory?
And is that solution fully portable ?
Thanks

First off, trying to use a deleted object in any form results in undefined behavior: whatever result you see could happen!
The reason of the observed behavior is simply that an object changes type during destruction: from being an object of the concrete type it change through all of the types in the hierarchy. At each point the virtual functions change and the vtable (or similar) gets replaced. The dynamic_cast<...>() simply detects this change in the bytes strored at the location of the object.
In case you feel like wanting to show that this technique doesn't reliably work you can just set the content of deleted memory to a random bit pattern or the bit pattern of an object of the most derived type: a random bit pattern probably yields a crash and memcpy() probably claims that the object is still life. Of course, since it is undefined behavior anything can happen.
One relevant section on this 3.8 [basic.life] paragraph 5:
Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. For an object under construction or destruction, see 12.7. Otherwise, such a pointer refers to allocated storage (3.7.4.2), and using the pointer as if the pointer were of type void*,
is well-defined. Indirection through such a pointer is permitted but the resulting lvalue may only be used in limited ways, as described below. The program has undefined behavior if:
...
the pointer is used as the operand of a dynamic_cast (5.2.7). ...
Oddly, the example on the last bullet on dynamic_cast doesn't use dynamic_cast.
Of course, the object is also probably released in which case the above guarantees don't even apply.

Related

Why does it work when it breaks the rule of order of initialization list

Why does this code work? I expected this to fail because of breaking of one of the basic C++ rules:
#include <iostream>
using namespace std;
struct A {
A() { cout << "ctor A" << endl; }
void doSth() { cout << "a doing sth" << endl; }
};
struct B {
B(A& a) : a(a) { cout << "ctor B" << endl; }
void doSth() { a.doSth(); }
A& a;
};
struct C {
C() : b(a) { cout << "ctor C" << endl; }
void doSth() { b.doSth(); }
B b;
A a;
};
int main()
{
C c;
c.doSth();
}
https://wandbox.org/permlink/aoJsYkbhDO6pNrg0
I expected this to fail since in C's constructor, B is given a reference to object of A when this A object has not yet been created.
Am I missing something? Does the rule of order of initialization being the same as the order of fields not apply for references?
EDIT:
What surprises me even more is that I can add a call to "a.doSth();" inside B constructor and this will also work. Why? At this moment the A object should not exist!
Your code is fine so long as the constructor of B doesn't use that reference it gets for anything other than binding its member. The storage for a has already been allocated when the c'tor of C starts, and like Sneftel says, it's in scope. As such, you may take its reference, as [basic.life]/7 explicitly allows:
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see [class.cdtor]. Otherwise, such
a glvalue refers to allocated storage
([basic.stc.dynamic.deallocation]), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
the glvalue is used to access the object, or
the glvalue is used to call a non-static member function of the object, or
the glvalue is bound to a reference to a virtual base class ([dcl.init.ref]), or
the glvalue is used as the operand of a dynamic_­cast or as the operand of typeid.
Regarding your edit:
What surprises me even more is that I can add a call to "a.doSth();" inside B constructor and this will also work. Why? At this moment the A object should not exist!
Undefined behavior is undefined. The second bullet in the paragraph I linked to pretty much says it. A compiler may be clever enough to catch it, but it doesn't have to be.
In your code snippet, when C is being constructed, a has not been initialized but it is already in scope, so the compiler is not required to issue a diagnostic. Its value is undefined.
The code is fine in the sense that B::a is properly an alias of C::a. The lifetime of the storage backing C::a has already begun by the time B::B() runs.
With respect to your edit: Although C::a's storage duration has already begun, a.doSth() from B::B() would absolutely result in undefined behavior (google to see why something can be UB and still "work").
This works because you are not accessing uninitialized field C::a during C::binitialization. By calling C() : b(a) you are binding a reference to a to be supplied for B(A& a) constructor. If you change your code to actually use uninitialized value somehow then it will be an undefined behavior:
struct B {
B(A& a)
: m_a(a) // now this calls copy constructor attempting to access uninitialized value of `a`
{ cout << "ctor B" << endl; }
void doSth() { a.doSth(); }
A m_a;
};
Undefined behavior means anything is possible, including appearing to work fine. Doesn't mean it will work fine next week or even the next time you run it - you might get demons flying from your nose.
What's probably going on when you call a.doSth() is that the compiler converts the call to a static a::doSth(); since it's not a virtual function, it doesn't need to access the object to make the call. The function itself doesn't use any member variables or functions so no invalid accesses are generated. It works even though it's not guaranteed to work.
It doesn't "work" in the sense that the a object used for initialization hasn't had its constructor called yet (which your logs reveal) - this means that the init of b might or might not fail depending on what a is doing.
The compiler doesn't prevent that, but I guess it should. Anyway, I don't think this is UB unless you actually try to use the unitialized object; just storing the reference should be fine.
It works because B is initialized with a reference, and that reference already exists so it can be used to initialize something with it.
If you try with a being passed by value in ctor of B then the compiler would complain:
warning: field 'a' is uninitialized when used here
[-Wuninitialized]

Default Destructor V.S. A Simply Defined Destructor

I wrote this to analyse the behavior of the destructor function of a class and its effect on memory deallocation, but the result seems to be a bit surprising to me:
class test {
public:
test() {}
~test() {} //<-----Run the program once with and once without this line.
};
int main()
{
test x;
test *a(&x);
cout << "a before memory allocation: " << a << endl;
a = new test;
cout << "a after memory allocation and before delete: " << a << endl;
delete a;
cout << "a after delete: " << a << endl;
return 0;
}
With default destructor the result is:
But with my own destructor it's:
Isn't the second result erroneous? Because somewhere I read that:
the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage.
Maybe I'm not following it correctly(especially due to the difficult English words used!). Would you please explain to me why is this happening?
What's exactly the difference between my simply defined destructor and the C++ default destructor?
Thanks for your help in advance.
If a is a (non-null) pointer to an object, then operation delete a triggers the destructor of the object to which a is pointing to (the default destructor or a specific one) and finally frees the memory that had been allocated for this object. The memory to which a has pointed is not a valid object any more, and a must not be dereferenced any more. However, delete a does not set the value of pointer a back to a specific value. Actually I'm surprised that delete a in your case changed the value of a; I cannot reproduce this behaviour.

Difference calling virtual through named member versus address or reference

Updated below: In clang, using an lvalue of a polymorphic object through its name does not activate virtual dispatch, but it does through its address.
For the following base class B and derived D, virtual function something, union Space
#include <iostream>
using namespace std;
struct B {
void *address() { return this; }
virtual ~B() { cout << "~B at " << address() << endl; }
virtual void something() { cout << "B::something"; }
};
struct D: B {
~D() { cout << "~D at " << address() << endl; }
void something() override { cout << "D::something"; }
};
union Space {
B b;
Space(): b() {}
~Space() { b.~B(); }
};
If you have a value s of Space, in Clang++: (update: incorrectly claimed g++ had the same behavior)
If you do s.b.something(), B::something() will be called, not doing the dynamic binding on s.b, however, if you call (&s.b)->something() will do the dynamic binding to what b really contains (either a B or D).
The completion code is this:
union SpaceV2 {
B b;
SpaceV2(): b() {}
~SpaceV2() { (&b)->~B(); }
};
static_assert(sizeof(D) == sizeof(B), "");
static_assert(alignof(D) == alignof(B), "");
#include <new>
int main(int argc, const char *argv[]) {
{
Space s;
cout << "Destroying the old B: ";
s.b.~B();
new(&s.b) D;
cout << "\"D::something\" expected, but \"";
s.b.something();
cout << "\" happened\n";
auto &br = s.b;
cout << "\"D::something\" expected, and \"";
br.something();
cout << "\" happened\n";
cout << "Destruction of D expected:\n";
}
cout << "But did not happen!\n";
SpaceV2 sv2;
new(&sv2.b) D;
cout << "Destruction of D expected again:\n";
return 0;
}
When compile with -O2 optimization and I run the program, this is the output:
$./a.out
Destroying the old B: ~B at 0x7fff4f890628
"D::something" expected, but "B::something" happened
"D::something" expected, and "D::something" happened
Destruction of D expected:
~B at 0x7fff4f890628
But did not happen!
Destruction of D expected again:
~D at 0x7fff4f890608
~B at 0x7fff4f890608
What surprises me is that setting the dynamic type of s.b using placement new leads to a difference calling something on the very same l-value through its name or through its address. The first question is essential, but I have not been able to find an answer:
Is doing placement new to a derived class, like new(&s.b) D undefined behavior according to the C++ standard?
If it is not undefined behavior, is this choice of not activating virtual dispatch through the l-value of the named member something specified in the standard or a choice in G++, Clang?
Thanks, my first question in S.O. ever.
UPDATE
The answer and the comment that refers to the standard are accurate: According to the standard, s.b will forever refer to an object of exact type B, the memory is allowed to change type, but then any use of that memory through s.b is "undefined behavior", that is, prohibited, or that the compiler can translate however it pleases. If Space was just a buffer of chars, it would be valid to in-place construct, destruct, change the type. Did exactly that in the code that led to this question and it works with standards-compliance AFAIK.
Thanks.
The expression new(&s.b) D; re-uses the storage named s.b and formerly occupied by a B for for storage of a new D.
However you then write s.b.something(); . This causes undefined behaviour because s.b denotes a B but the actual object stored in that location is a D. See C++14 [basic.life]/7:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
the storage for the new object exactly overlays the storage location which the original object occupied, and
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
[...]
The last bullet point is not satisfied because the new type differs.
(There are other potential issues later in the code too but since undefined behaviour is caused here, they're moot; you'd need to have a major design change to avoid this problem).

const reference public member to private class member - why does it work?

Recently, I found an interesting discussion on how to allow read-only access to private members without obfuscating the design with multiple getters, and one of the suggestions was to do it this way:
#include <iostream>
class A {
public:
A() : _ro_val(_val) {}
void doSomething(int some_val) {
_val = 10*some_val;
}
const int& _ro_val;
private:
int _val;
};
int main() {
A a_instance;
std::cout << a_instance._ro_val << std::endl;
a_instance.doSomething(13);
std::cout << a_instance._ro_val << std::endl;
}
Output:
$ ./a.out
0
130
GotW#66 clearly states that object's lifetime starts
when its constructor completes successfully and returns normally. That is, control reaches the end of the constructor body or an earlier return statement.
If so, we have no guarantee that the _val memeber will have been properly created by the time we execute _ro_val(_val). So how come the above code works? Is it undefined behaviour? Or are primitive types granted some exception to the object's lifetime?
Can anyone point me to some reference which would explain those things?
Before the constructor is called an appropriate amount of memory is reserved for the object on Freestore(if you use new) or on stack if you create object on local storage. This implies that the memory for _val is already allocated by the time you refer it in Member initializer list, Only that this memory is not properly initialized as of yet.
_ro_val(_val)
Makes the reference member _ro_val refer to the memory allocated for _val, which might actually contain anything at this point of time.
There is still an Undefined Behavior in your program because, You should explicitly initialize _val to 0(or some value,you choose)in the constructor body/Member Initializer List.The output 0 in this case is just because you are lucky it might give you some other values since _val is left unInitialized. See the behavior here on gcc 4.3.4 which demonstrates the UB.
But as for the Question, Yes indeed the behavior is Well-Defined.
The object's address does not change.
I.e. it's well-defined.
However, the technique shown is just premature optimization. You don't save programmers' time. And with modern compiler you don't save execution time or machine code size. But you do make the objects un-assignable.
Cheers & hth.,
In my opinion, it is legal (well-defined) to initialize a reference with an uninitialized object. That is legal but standard (well, the latest C++11 draft, paragraph 8.5.3.3) recommends using a valid (fully constructed) object as an initializer:
A reference shall be initialized to refer to a valid object or function.
The next sentence from the same paragraph throws a bit more light at the reference creation:
[Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.]
I understand that reference creation means binding reference to an object obtained by dereferencing its pointer and that probably explains that the minimal prerequisite for initialization of reference of type T& is having an address of the portion of the memory reserved for the object of type T (reserved, but not yet initialized).
Accessing uninitialized object through its reference can be dangerous.
I wrote a simple test application that demonstrates reference initialization with uninitialized object and consequences of accessing that object through it:
class C
{
public:
int _n;
C() : _n(123)
{
std::cout << "C::C(): _n = " << _n << " ...and blowing up now!" << std::endl;
throw 1;
}
};
class B
{
public:
// pC1- address of the reference is the address of the object it refers
// pC2- address of the object
B(const C* pC1, const C* pC2)
{
std::cout << "B::B(): &_ro_c = " << pC1 << "\n\t&_c = " << pC2 << "\n\t&_ro_c->_n = " << pC1->_n << "\n\t&_c->_n = " << pC2->_n << std::endl;
}
};
class A
{
const C& _ro_c;
B _b;
C _c;
public:
// Initializer list: members are initialized in the order how they are
// declared in class
//
// Initializes reference to _c
//
// Fully constructs object _b; its c-tor accesses uninitialized object
// _c through its reference and its pointer (valid but dangerous!)
//
// construction of _c fails!
A() : _ro_c(_c), _b(&_ro_c, &_c), _c()
{
// never executed
std::cout << "A::A()" << std::endl;
}
};
int main()
{
try
{
A a;
}
catch(...)
{
std::cout << "Failed to create object of type A" << std::endl;
}
return 0;
}
Output:
B::B(): &_ro_c = 001EFD70
&_c = 001EFD70
&_ro_c->_n = -858993460
&_c->_n = -858993460
C::C(): _n = 123 ...and blowing up now!
Failed to create object of type A

Added benefit of a pointer, when to use one and why

I'm learning C++ at the moment and though I grasp the concept of pointers and references for the better part, some things are unclear.
Say I have the following code (assume Rectangle is valid, the actual code is not important):
#include <iostream>
#include "Rectangle.h"
void changestuff(Rectangle& rec);
int main()
{
Rectangle rect;
rect.set_x(50);
rect.set_y(75);
std::cout << "x,y: " << rect.get_x() << rect.get_y() << sizeof(rect) << std::endl;
changestuff(rect);
std::cout << "x,y: " << rect.get_x() << rect.get_y() << std::endl;
Rectangle* rectTwo = new Rectangle();
rectTwo->set_x(15);
rectTwo->set_y(30);
std::cout << "x,y: " << rect.get_x() << rect.get_y() << std::endl;
changestuff(*rectTwo);
std::cout << "x,y: " << rect.get_x() << rect.get_y() << std::endl;
std::cout << rectTwo << std::endl;
}
void changestuff(Rectangle& rec)
{
rec.set_x(10);
rec.set_y(11);
}
Now, the actual Rectangle object isn't passed, merely a reference to it; it's address.
Why should I use the 2nd method over the first one? Why can't I pass rectTwo to changestuff, but *rectTwo? In what way does rectTwo differ from rect?
There really isn't any reason you can't. In C, you only had pointers. C++ introduces references and it is usually the preferred way in C++ is to pass by reference. It produces cleaner code that is syntactically simpler.
Let's take your code and add a new function to it:
#include <iostream>
#include "Rectangle.h"
void changestuff(Rectangle& rec);
void changestuffbyPtr(Rectangle* rec);
int main()
{
Rectangle rect;
rect.set_x(50);
rect.set_y(75);
std::cout << "x,y: " << rect.get_x() << rect.get_y() << sizeof(rect) << std::endl;
changestuff(rect);
std::cout << "x,y: " << rect.get_x() << rect.get_y() << std::endl;
changestuffbyPtr(&rect);
std::cout << "x,y: " << rect.get_x() << rect.get_y() << std::endl;
Rectangle* rectTwo = new Rectangle();
rectTwo->set_x(15);
rectTwo->set_y(30);
std::cout << "x,y: " << rectTwo->get_x() << rectTwo->get_y() << std::endl;
changestuff(*rectTwo);
std::cout << "x,y: " << rectTwo->get_x() << rectTwo->get_y() << std::endl;
changestuffbyPtr(rectTwo);
std::cout << "x,y: " << rectTwo->get_x() << rectTwo->get_y() << std::endl;
std::cout << rectTwo << std::endl;
}
void changestuff(Rectangle& rec)
{
rec.set_x(10);
rec.set_y(11);
}
void changestuffbyPtr(Rectangle* rec)
{
rec->set_x(10);
rec->set_y(11);
}
Difference between using the stack and heap:
#include <iostream>
#include "Rectangle.h"
Rectangle* createARect1();
Rectangle* createARect2();
int main()
{
// this is being created on the stack which because it is being created in main,
// belongs to the stack for main. This object will be automatically destroyed
// when main exits, because the stack that main uses will be destroyed.
Rectangle rect;
// rectTwo is being created on the heap. The memory here will *not* be released
// after main exits (well technically it will be by the operating system)
Rectangle* rectTwo = new Rectangle();
// this is going to create a memory leak unless we explicitly call delete on r1.
Rectangle* r1 = createARectangle();
// this should cause a compiler warning:
Rectangle* r2 = createARectangle();
}
Rectangle* createARect1()
{
// this will be creating a memory leak unless we remember to explicitly delete it:
Rectangle* r = new Rectangl;
return r;
}
Rectangle* createARect2()
{
// this is not allowed, since when the function returns the rect will no longer
// exist since its stack was destroyed after the function returns:
Rectangle r;
return &r;
}
It should also be worth mentioning that a huge difference between pointers and references is that you can not create a reference that is uninitialized. So this perfectly legal:
int *b;
while this is not:
int& b;
A reference has to refer to something. This makes references basically unusable for polymorphic situations, in which you may not know what the pointer is initialized to. For instance:
// let's assume A is some interface:
class A
{
public:
void doSomething() = 0;
}
class B : public A
{
public:
void doSomething() {}
}
class C : public A
{
public:
void doSomething() {}
}
int main()
{
// since A contains a pure virtual function, we can't instantiate it. But we can
// instantiate B and C
B* b = new B;
C* c = new C;
// or
A* ab = new B;
A* ac = new C;
// but what if we didn't know at compile time which one to create? B or C?
// we have to use pointers here, since a reference can't point to null or
// be uninitialized
A* a1 = 0;
if (decideWhatToCreate() == CREATE_B)
a1 = new B;
else
a1 = new C;
}
In C++, objects can be allocated on the heap or on the stack. The stack is valid only locally, that is when you leave the current function, the stack and all contents will be destroyed.
On the contrary, heap-objects (which must be specifically allocated using new) will live as long you don't delete them.
Now the idea is that you a caller should not need to know what a method does (encapsulation), internally. Since the method might actually store and keep the reference you have passed to it, this might be dangerous: If the calling method returns, stack-objects will be destroyed, but the references are kept.
In your simple example, it all doesn't matter too much because the program will end when main() exits anyhow. However, for every program that is just a little more complex, this can lead to serious trouble.
You need to understand that references are NOT pointers. They ,may be implemented using them (or they may not) but a reference in C++ is a completely different beast to a pointer.
That being said, any function that takes a reference can be used with pointers simply by dereferencing them (and vice versa). Given:
class A {};
void f1( A & a ) {} // parameter is reference
void f2( A * a ) {} // parameter is pointer
you can say:
A a;
f1( a )
f2 ( &a );
and:
A * p = new A;
f1( *a )
f2 ( a );
Which should you use when? Well that comes down to experience, but general good practice is:
prefer to allocate objects automatically on the stack rather than using new whenever possible
pass objects using references (preferably const references) whenever possible
rectTwo differs from rect in that rect is an instance of a Rectangle on the stack and rectTwo is the address of a Rectangle on the heap. If you pass a Rectangle by value, a copy of it is made, and you will not be able to make any changes that exist outside of the scope of changestuff().
Passing it by reference means that changestuff will have the memory address of the Rectangle instance itself, and changes are not limited to the scope of changestuff (because neither is the Rectangle).
Edit: your comment made the question more clear. Generally, a reference is safer than a pointer.
From Wikipedia:
It is not possible to refer directly to a reference object after it is
defined; any occurrence of its name
refers directly to the object it
references.
Once a reference is created, it cannot be later made to reference
another object; it cannot be reseated.
This is often done with pointers.
References cannot be null, whereas pointers can; every reference refers
to some object, although it may or may
not be valid.
References cannot be uninitialized. Because it is impossible to
reinitialize a reference, they must be
initialized as soon as they are
created. In particular, local and
global variables must be initialized
where they are defined, and references
which are data members of class
instances must be initialized in the
initializer list of the class's
constructor.
Additionally, objects allocated on the heap can lead to memory leaks, whereas objects allocated on the stack will not.
So, use pointers when they are necessary, and references otherwise.
Quite a few application domains require the use of pointers. Pointers are needed when you have intimate knowledge about how your memory is layed out. This knowledge could be because you intended the memory to be layed out in a certain way, or because the layout is out of your control. When this is the case you need pointers.
Why would you have manually structured the memory for a certain problem domain ? Well an optimal memory layout for a certain problems are orders of magnitude faster than if you used traditional techniques.
Example domains:
Enterprise Databases.
Kernel design.
Drivers.
General purpose Linear Algebra.
Binary Data serialization.
Slab Memory allocators for transaction processing (web-servers).
Video game engines.
Embedded real-time programming.
Image processing
Unicode Utility functions.
You are right to say that the actual Rectangle object isn't passed, merely a reference to it. In fact you can never 'pass' any object or anything else really. You can only 'pass' a copy of something as a parameter to a function.
The something that you can pass could be a copy of a value, like an int, or a copy of an object, or a copy of a pointer or reference. So, in my mind, passing a copy of either a pointer or a reference is logically the same thing - syntactically its different, hence the parameter being either rect or *rectTwo.
References in C++ are a distinct advantage over C, since it allows the programmer to declare and define operators that look syntactically identical to those that are available for integers.
eg. the form: a=b+c can be used for ints or Rectangles.
This is why you can have changestuff(rect); because the parameter is a reference and a reference to (pointer to) rect is taken automatically. When you have the pointer Rectangle* rectTwo; it is an 'object' in its own right and you can operate on it, eg reassign it or increment it. C++ has chosen to not convert this to a reference to an object, you have to do this manually by 'dereferencing' the pointer to get to the object, which is then automatically converted to a reference. This is what *rectTwo means: dereferencing a pointer.
So, rectTwo is a pointer to a Rectangle, but rect is a rectangle, or a reference to a Rectangle.