C++ pointer covariance

C++ pointer covariance - c++

It never occured to me that c++ has pointer covariance and therefore lets you shoot yourself in the leg like this:
struct Base
{
Base() : a(5) {}
int a;
};
struct Child1 : public Base
{
Child1() : b(7) {}
int b;
int bar() { return b;}
};
struct Child2 : public Base
{
Child2(): c(8) {}
int c;
};
int main()
{
Child1 children1[2];
Base * b = children1;
Child2 child2;
b[1] = child2; // <------- now the first element of Child1 array was assigned a value of type Child2
std::cout << children1[0].bar() << children1[1].bar(); // prints 57
}
Is it an undefined behaviour? Is there any way to prevent it or at least have a warning from compiler?

Yes, this is undefined behavior.
And no, a typical C++ compiler, at this point, is unlikely to be able to identify something that merits a diagnostic, here. But, C++ compilers get smarter with each passing year. Who knows what will be the state of affairs, years from now...
However, a minor quibble:
b[1] = child2; // <------- now the first element of Child1 array was assigned...
No. That's not the first element. it's the second element. b[0] would be the first element. Furthermore, b is not an array, it's a pointer. And it's a pointer to a single element. It's not a pointer to a two element array.
And that is where the undefined behavior comes from.
The reason it's not an array is because:
Base * b = children1;
children1 decays to a Child1 *. If that's where the affair ended, you could say that b would be a pointer to a two-element array.
But that's not where the things ended. The decayed pointer was than casted to a Base *. You can implicitly cast a pointer to a subclass to a pointer to a superclass. But (loosely speaking now) you cannot cast a pointer to an array of subclasses to an array of superclasses. Hence, b is, strictly, a pointer to a single element, and b[1] becomes undefined behavior.

Related

to get the real type information of a pointer in c++

class myclass1 {
public:
virtual ~myclass1() {
}
};
class myclass2 : public myclass1 {
};
int main() {
myclass1 obj1;
myclass2 obj2;
myclass1 *p1 = &obj2;
myclass2 *p2 = static_cast<myclass2 *>(&obj1);
if( p1 && p2){
cout << typeid(p1).name() << endl;
cout << typeid(p2).name() << endl;
}
}
The output is as below:
P8myclass1
P8myclass2
Process finished with exit code 0
The code has two classes, I tried to use two types pointer to point to the other type. From base class to its children is totally ok while the other way around should not work ("myclass2 *p2 = static_cast<myclass2 *>(&obj1);"). If I use "dynamic_cast", the casted pointer will be null. But if I use "static_cast", the cast seems successful and the type is "myclass2" when I use typeid method.
When I am in debug mode in Clion, it seems the debugger knows the real type of the pointer, as shown in the image. It knows the type of p1 is "myclass2" and type of p2 is "myclass1". What is the magic of it?
obj1 = {myclass1}
obj2 = {myclass2}
p1 = {myclass2 * | 0x7ffeec114a08} 0x00007ffeec114a08
p2 = {myclass1 * | 0x7ffeec114a10} 0x00007ffeec114a10

typeid(p1) will give you the type of the pointer p1, which will always be myclass1 * (the string P8myclass1 is the mangled name for the type myclass1 *). If you want the type of the pointed-at object, you want typeid(*p1), which should be myclass2 in this case.
With p2, typeid(p2) will be myclass2 *, while typeid(*p2) gives undefined behavior -- you can't safely dereference a pointer after a static cast to the wrong type. It is likely that you'll get myclass1, but not certain -- you might get a crash.
The debugger is essentially doing that with extra knowledge and protection to avoid bad misbehavior from the undefined behavior.

The "magic" is likely undefined behaviour. You can't static_cast a pointer to point to a different type except in specific situations (like an upcast). In your case, &obj1 points to a myclass1 object, not a myclass2 object, so it is completely meaningless to perform the downcast.

Vector of pointers: Why does changing the pointer externally not change the vector element?

Please look at the following example:
#include <iostream>
#include <string>
#include <vector>
class Base {
public:
virtual void price() const = 0;
};
class D1 : public Base {
public:
virtual void price() const {
std::cout << "Price of D1\n";
}
};
class D2 : public Base {
public:
virtual void price() const {
std::cout << "Price of D2\n";
}
};
int main()
{
std::vector<Base*> v;
Base* b = new D1();
v.push_back(b);
v[0]->price(); // "Price of D1\n" - OK!
b = new D2();
v[0]->price(); // "Price of D1\n"?! Why not "Price of D2\n"
}
A simple base class has two derived classes. In my main() I declare a vector containing pointers to the base class and fill it with a pointer to an object of D1.
When I change b to a pointer to D2, b = new D2();, why doesn't that change the element v[0] accordingly? Aren't they supposed to point to the same thing?

I'll try to explain it on a lower level.
First of all, understand that a pointer is essentially simply a variable that holds a value, which happens to be an address in memory.
Let's simplify things and assume that every type has the same size and that there are only 5 addresses of memory (on the heap). (Also I will ignore that the array will allocate on the heap too.)
We now execute
Base* b = new D1();
Let's say that the allocated memory is at the address 3. b is now a variable that simply holds the value 3. Our (heap) memory looks like this:
0: ?
1: ?
2: ?
3: variable of type D1
4: ?
Then we continue with
v.push_back(b);
We now have that the array v holds one entry of value 3.
b = new D2();
We now allocated a new part of memory, let's say at address 1:
0: ?
1: variable of type D2
2: ?
3: variable of type D1
4: ?
b now stores this address, that is, the value of b is 1. If we look at v, we haven't changed it. It contains one entry with value 3.
v[0]->price();
This gives us a pointer with value 3, which we dereference and print. Given our memory map above, what is stored there is a variable of type D1.
Did this clarify things for you?
I also extended your code a little bit to demonstrate this with realistic addresses:
http://www.cpp.sh/3s4b4
(If you run that code, note that addresses allocated right after each other tend to be almost similar, often only differing by a single digit, thus look closely. For example, I got 0x1711260 and 0x1711220.)
If you really need to do what you expected, you could store a vector over Base** and store the address of b in the vector:
std::vector<Base**> v;
Base* b = new D1();
v.push_back(&b);
(*v[0])->price();
b = new D2();
(*v[0])->price();
Implemented on http://www.cpp.sh/8c3i3. But I wouldn't recommend that if you don't absolutely need to, lacks readability and unnecessarily confuses. (Note that the first pointer is not deleted in this example and unreachable after changing b, thus we'd leak memory.)

When I change b to a pointer to D2, b = new D2();, why doesn't that change the element v[0] accordingly? Aren't they supposed to point to the same thing?
Because v.push_back(b); stores a copy of the pointer b in your vector. So if you change b to point to something else afterwards, it will have no effect whatsoever on v[0]
You can simplify it and see it like this:
int *ptr = new int;
int *ptr_copy = ptr;
*ptr = 2; //both *ptr and *ptr_copy have value: "2"
ptr = new int; //ptr now points to some other memory
*ptr = 5; //*ptr = 5, but *ptr_copy will still be "2"
Now if you really want the changes to be reflected even when you change the pointer, you need another level of indirection i.e., "pointer to pointer":
int main()
{
std::vector<Base**> v; //vector of Base**
Base* b = new D1();
v.push_back(&b);
(*v[0])->price(); // "Price of D1\n" - OK!
b = new D2();
(*v[0])->price(); // "Price of D2\n"
}

Try this variation
vector<int> v;
int b = 123;
v.push_back(b);
cout << v[0]; // prints 123
b = 456;
cout << v[0]; // still prints 123
That changing b doesn't change v[0] is hopefully obvious, because the value of v[0] is a copy of the value of b. But then compare with your code, what's the difference? There is none, it's exactly the same situation v[0] is a copy of b and changing b doesn't change v[0]. The fact that your case involves pointers make no difference at all (in this regard) because there's nothing special about pointers.

When you insert a pointer into a std::vector of pointers, you copy that pointer into the container. After
std::vector<Base*> v;
Base* b = new D1();
v.push_back(b);
you have two pointers that refer to the D1 instance: b and vec[0]. Now if you go on with
b = new D2();
you only overwrite one of these two, that is obviously b, but not vec[0]. Hence, accessing vec[0] gives you a reference to the pointer stored there in the first place.

interpreting object addresses with reintepret_cast

The following code gives the output as 136. But I could not understand how the first two address comparisons are equal. Appreciate any help to understand this. Thank you.
#include <iostream>
class A
{
public:
A() : m_i(0){ }
protected:
int m_i;
};
class B
{
public:
B() : m_d(0.0) { }
protected:
double m_d;
};
class C : public A, public B
{
public:
C() : m_c('a') { }
private:
char m_c;
};
int main( )
{
C d;
A *b1 = &d;
B *b2 = &d;
const int a = (reinterpret_cast<char *>(b1) == reinterpret_cast<char *>(&d)) ? 1 : 2;
const int b = (b2 == &d) ? 3 : 4;
const int c = (reinterpret_cast<char *>(b1) == reinterpret_cast<char *>(b2)) ? 5 : 6;
std::cout << a << b << c << std::endl;
return 0;
}

When you use multiple inheritance like in your example the first base class and the derived class share the same base address. Additional classes you inherit from are arranged in order at an offset based on the size of all preceding classes. The result of the comparison is true because the base address of d and b1 are the same.
In your case, if the size of A is 4 bytes then B will start at base address of A + 4 bytes. When you do B *b2 = &d; the compiler calculates the offset and adjusts the pointer value accordingly.
When you do b2 == &d an implicit conversion from type 'C' to type 'B' is performed on d before the comparison is done. This conversion adjusts the offset of the pointer value just as it would in an assignment.

It’s pretty typical for a derived class (like C here) to be laid out in memory so it starts with its two base classes (A and B), so the address of an instance of type C would be identical to the address of the instance its first base class (i.e. A).

In this kind of inheritance (when virtual is not involved,) each instance of C will have the following layout:
First, there will be all members of A (which is just m_i, a 4-byte integer)
Second will be all members of B (which is just m_d, an 8-byte double)
Last will be all members of C itself, which is just a character (1 byte, m_c)
When you cast a pointer to an instance of C to A, because A is the first parent, no address adjustment takes place, and the numerical value of the two pointers will be the same. This is why the first comparison evaluates to true. (Note that doing a reinterpret_cast<char *>() on a pointer never causes adjustment, so it always gives the numerical value of the pointer. Casting to void * would have the same effect and is probably safer for comparison.)
Casting a pointer to an instance of C to B will cause a pointer adjustment (by 4 bytes) which means that the numerical value of b2 will not be equal to &d. However, when you directly compare b2 and &d, the compiler automatically generates a cast for &d to B *, which will adjust the numerical value by 4 bytes. This is the reason that the second comparison also evaluates to true.
The third comparison return false because, as said before, casting a pointer to an instance of C to A or to B will have different results (casting to A * doesn't do adjustment, while casting to B * does.)

Why is it undefined behavior to delete[] an array of derived objects via a base pointer?

I found the following snippet in the C++03 Standard under 5.3.5 [expr.delete] p3:
In the first alternative (delete object), if the static type of the object to be deleted is different from its dynamic type, the static type shall be a base class of the operand’s dynamic type and the static type shall have a virtual destructor or the behavior is undefined. In the second alternative (delete array) if the dynamic type of the object to be deleted differs from its static type, the behavior is undefined.
Quick review on static and dynamic types:
struct B{ virtual ~B(){} };
struct D : B{};
B* p = new D();
Static type of p is B*, while the dynamic type of *p is D, 1.3.7 [defns.dynamic.type]:
[Example: if a pointer p whose static type is “pointer to class B” is pointing to an object of class D, derived from B, the dynamic type of the expression *p is “D.”]
Now, looking at the quote at the top again, this would mean that the follwing code invokes undefined behaviour if I got that right, regardless of the presence of a virtual destructor:
struct B{ virtual ~B(){} };
struct D : B{};
B* p = new D[20];
delete [] p; // undefined behaviour here
Did I misunderstand the wording in the standard somehow? Did I overlook something? Why does the standard specify this as undefined behaviour?

Base* p = new Base[n] creates an n-sized array of Base elements, of which p then points to the first element. Base* p = new Derived[n] however, creates an n-sized array of Derived elements. p then points to the Base subobject of the first element. p does not however refer to the first element of the array, which is what a valid delete[] p expression requires.
Of course it would be possible to mandate (and then implement) that delete [] p Does The Right Thing™ in this case. But what would it take? An implementation would have to take care to somehow retrieve the element type of the array, and then morally dynamic_cast p to this type. Then it's a matter of doing a plain delete[] like we already do.
The problem with that is that this would be needed every time an array of polymorphic element type, regardless of whether the polymorphism is used on not. In my opinion, this doesn't fit with the C++ philosophy of not paying for what you don't use. But worse: a polymorphic-enabled delete[] p is simply useless because p is almost useless in your question. p is a pointer to a subobject of an element and no more; it's otherwise completely unrelated to the array. You certainly can't do p[i] (for i > 0) with it. So it's not unreasonable that delete[] p doesn't work.
To sum up:
arrays already have plenty of legitimate uses. By not allowing arrays to behave polymorphically (either as a whole or only for delete[]) this means that arrays with a polymorphic element type are not penalized for those legitimate uses, which is in line with the philosophy of C++.
if on the other hand an array with polymorphic behaviour is needed, it's possible to implement one in terms of what we have already.

It's wrong to treat an array-of-derived as an array-of-base, not only when deleting items. For example even just accessing the elements will usually cause disaster:
B *b = new D[10];
b[5].foo();
b[5] will use the size of B to calculate which memory location to access, and if B and D have different sizes, this will not lead to the intended results.
Just like a std::vector<D> can't be converted to a std::vector<B>, a pointer to D[] shouldn't be convertible to a B*, but for historic reasons it compiles anyway. If a std::vector would be used instead, it would produce a compile time error.
This is also explained in the C++ FAQ Lite answer on this topic.
So delete causes undefined behavior in this case because it's already wrong to treat an array in this way, even though the type system can't catch the error.

Just to add to the excellent answer of sth - I have written a short example to illustrate this issue with different offsets.
Note that if you comment out the m_c member of the Derived class, the delete operation will work well.
Cheers,
Guy.
#include <iostream>
using namespace std;
class Base
{
public:
Base(int a, int b)
: m_a(a)
, m_b(b)
{
cout << "Base::Base - setting m_a:" << m_a << " m_b:" << m_b << endl;
}
virtual ~Base()
{
cout << "Base::~Base" << endl;
}
protected:
int m_a;
int m_b;
};
class Derived : public Base
{
public:
Derived()
: Base(1, 2) , m_c(3)
{
}
virtual ~Derived()
{
cout << "Derived::Derived" << endl;
}
private:
int m_c;
};
int main(int argc, char** argv)
{
// create an array of Derived object and point them with a Base pointer
Base* pArr = new Derived [3];
// now go ahead and delete the array using the "usual" delete notation for an array
delete [] pArr;
return 0;
}

IMHO this has to do with limitation of arrays to deal with constructor/destructor. Note that, when new[] is called, compiler forces to instantiate only default constructor. In the same way when delete[] is called, compiler might look for only the destructor of calling pointer's static type.
Now in the case of virtual destructor, Derived class destructor should be called first followed by the Base class. Since for arrays compiler might see the static type of calling object (here Base) type, it might end up calling just Base destructor; which is UB.
Having said that, it's not necessarily UB for all compilers; say for example gcc calls destructor in proper order.

I think it all comes down to the zero-overhead principle. i.e. the language doesn't allow storing information about the dynamic type of elements of the array.

Upcasting pointer reference

I have the following contrived example (coming from real code):
template <class T>
class Base {
public:
Base(int a):x(a) {}
Base(Base<T> * &other) { }
virtual ~Base() {}
private:
int x;
};
template <class T>
class Derived:public Base<T>{
public:
Derived(int x):Base<T>(x) {}
Derived(Derived<T>* &other): Base<T>(other) {}
};
int main() {
Derived<int> *x=new Derived<int>(1);
Derived<int> y(x);
}
When I try to compile this, I get:
1X.cc: In constructor ‘Derived<T>::Derived(Derived<T>*&) [with T = int]’:
1X.cc:27: instantiated from here
1X.cc:20: error: invalid conversion from ‘Derived<int>*’ to ‘int’
1X.cc:20: error: initializing argument 1 of ‘Base<T>::Base(int) [with T = int]’
1) Clearly gcc is being confused by the constructors. If I remove the reference
from the constructors, then the code compiles. So my assumption is that something goes wrong
with up-casting pointer references. Can someone tell me what is going on here?
2) A slightly unrelated question. If I were to do something horrendous like "delete other" in the constructor (bear with me),
what happens when someone passes me a pointer to something on the stack ?
E.g. Derived<int> x(2);
Derived<int> y(x);
where
Derived(Derived<T>*& other) { delete other;}
How can I make sure that pointer is legitimately pointing to something on the heap?

Base<T> is a base type of Derived<T>, but Base<T>* is not a base type of Derived<T>*. You can pass a derived pointer in place of a base pointer, but you can't pass a derived pointer reference in place of a base pointer reference.
The reason is that, suppose you could, and suppose the constructor of Base were to write some value into the reference:
Base(Base<T> * &other) {
Base<T> *thing = new Base<T>(12);
other = thing;
}
You've just written a pointer to something which is not a Derived<T>, into a pointer to Derived<T>. The compiler can't let this happen.

You cannot convert a reference to a pointer to Derived to a reference to a pointer to Base. (Templates don't contribute to the issue here, so removed from my example below.)
If you want to defer responsibility for a pointer, use a smart pointer type. Smart pointer types can represent the "responsibility to delete" that raw pointers cannot. Examples include std::auto_ptr and boost::shared_ptr, among many others.
Why you cannot upcast pointer references:
struct Base {};
struct Derived : Base {};
struct Subclass : Base {};
int main() {
Derived d;
Derived* p = &d;
Derived*& d_ptr = p;
Base*& b_ptr = d_ptr; // this is not allowed, but let's say it is
Base b;
b_ptr = &b; // oops! d_ptr no longer points to a Derived!
Subclass s;
b_ptr = &s; // oops! d_ptr no longer points to a Derived!
}
When you pass your 'other' parameter to the Base ctor, you're trying to do the same thing as b_ptr = d_ptr above.

You make sure that pointer points to something on the heap by writing that in your documentation and relying on the caller to abide by that. If whoever calls your constructor passes a stack pointer, all bets are off, and it's not your fault - you can try to catch the problem early, but no guarantees.
That's how the standard library works - often it'll catch obvious errors, but it's not required to, and it's up to the caller to make sure they're not doing anything stupid.

Your x variable is not a pointer, it should be if you want to assign a new Derived<int> to it.
As for deleting things on the stack, don't do it. There is no way to tell whether you have been passed the address of something on the stack or on the heap (indeed, the C++ standard doesn't even acknowledge the existence of a stack). The lesson here is that you shouldn't be deleting things that you don't own, especially if you have no way of telling where they came from.

Not sure why do you want reference to the pointer. Why not
Base(Base<T> * other) { }
and
Derived(Derived<T>* other): Base<T>(other) {}
That should work.
And, like other answered, I don't think you can legitimately know whether the pointer is pointing into heap.
Edit: why can't one do what you're trying to: consider example:
Derived1<int> *x = new Derived1<int>
Base<int> **xx =&x;
Derived2<int> y;
*xx = &y;
Where Derived1 and Derived2 are different classes derived from Base? Would you think it's legitimate? Now that x of type Derived1* points to Derived2?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ pointer covariance - c++

Related

to get the real type information of a pointer in c++

Vector of pointers: Why does changing the pointer externally not change the vector element?

interpreting object addresses with reintepret_cast

Why is it undefined behavior to delete[] an array of derived objects via a base pointer?

Upcasting pointer reference

Categories

Resources