c++ polymorphism ((X*)y)->foo() vs ((X)*y).foo() - c++

Suppose Y is a derived class from class X and X declares foo to be virtual. Suppose y is of type (Y*). Then ((X*)y)->foo() will execute the Y version of foo(), but ((X)*y).foo() will execute the X version. Can you tell me why polymorphism does not apply in the dereferenced case? I would expect either syntax would yield the Y version of foo().

You are slicing the Y object part and copy the object into an X object. The function then called is called on an X object, and thus the function of X is called.
When you specify a type in C++ in a declaration or cast, that is meant to say that the object declared or casted-to is actually of that type, not of a derived type.
If you want to merely treat the object is being of type X (that is to say, if you want the static type of the expression be X, but still want it to denote an Y object) then you cast to a reference type
((X&)*y).foo()
This will call the function in the Y object, and will not slice nor copy into an X object. In steps, this does
Dereference the pointer y, which is of type Y*. Dereferencing yields an lvalue expression of type Y. An lvalue expression can actually denote an object of a derived type, even if its static type is the one of its base.
Cast to a X&, which is a reference to X. That will yield an lvalue expression of type X.
Call the function.
Your original cast did
Dereference the pointer y.
The resulting expression casted to X. This will yield to a copy operation into a new X object. The resulting expression of that is an rvalue expression of static type X. The dynamic type of the object denoted is also X, as is with all rvalue expressions.
Call the function.

A cast always(*) creates a new object of the type you're casting to, which is constructed using the object you're casting.
Casting to X* creates a new pointer (that is, an object of type X*). It has the same value as y, so it still points to the same object, of type Y.
Casting to X creates a new X. It is constructed using *y, but it otherwise has nothing to do with the old object. In your example, foo() is called on this new "temporary" object, not on the object pointed to by y.
You are correct that dynamic polymorphism only applies to pointers and references, not to objects, and this is the reason why: if you have a pointer-to-X then the thing it points to might be a subclass of X. But if you have an X, then it's an X, and nothing else. virtual calls would be pointless.
(*) unless optimisation allows the omission of code that doesn't change the result. But optimisation isn't allowed to change what foo() function is called.

The dereferencing (the *y part) is fine, but the cast (the (X) part) creates a new (temporary) object specifically of class X -- that's what casting means. So, the object has to have the virtual table from class X -- consider that the casting will have removed any instance members added by Y in the subclassing (indeed, how could X's copy ctor possibly know about them?), so it would potentially be a disaster if any of Y's overrides were to execute -- secure in their knowledge that this points to an instance of Y, complete with added members and all... when that knowledge was false!
The version in which you cast pointers is of course completely different -- the *X has just the same bits as the Y*, so it's still pointing to a perfectly valid instance of Y (indeed, it's pointing to y, of course).
The sad fact is that, for safety, the copy ctor of a class should really only be called with, as argument, an instance of that class -- not of any subclass; the loss of added instance members &c is just too damaging. But the only way to ensure that is to follow Haahr's excellent advice, "Don't subclass concrete classes"... even though he's writing about Java, the advice is at least as good for C++ (which has this copy ctor "slicing" problem in addition!-)

I believe that's simply due to the way the language is specified. References and pointers use late binding wherever possible, while objects use early binding. It would be possible to do late binding in every case (I imagine), but a compiler that did that wouldn't be following the C++ specifications.

I think Darth Eru's explanation is correct, and here is why I think C++ behaves that way:
The code (X)*y is like creating a local variable that is of type X. The compiler needs to allocate sizeof(X) space on the stack, and it throws away any extra data included in an object of type Y, so when you call foo() it has to execute the X version. It would be difficult for the compiler to behave in a way that would let you call the Y version.
The code (X*)y is like creating a pointer to an object, and the compiler knows that the object pointed to is X or a subclass of X. At runtime when you dereference the pointer and call foo with "->foo()" the class of the object is determined and the proper function is used.

Related

What does "int*&&" do in C++?

I'm coming from a C# background and trying to learn a little C++.
I came across the following lines:
int x[3] = { 1, 2, 3 };
int*&& y = x;
int* z = y;
I know what pointers and arrays are and have some small understanding on lvalue and rvalue references. However I can't wrap my head around what int*&& y = x; actually does.
It would read like it creates a pointer to an rvalue reference, is this correct? What would be the use case of something like that, e.g. what is going on in memory if we execute this?
int*&& y = x; declares y to be a rvalue reference to pointer to int and initializes it with x. (Note that pointer to reference types (e.g. int&&*) do not exist in C++.)
Now the issue is that x is an array of int, not a pointer to int. So the reference can't bind directly to x. The types are not reference-compatible.
In such a situation (if the reference is either a rvalue reference or a const lvalue reference) an (unnamed) temporary object of the referenced type is created and the reference binds to that object instead. In this case an int* object is created and initialized with x, which by array-to-pointer decay means that the int* temporary object will be initialized to point to the first element of the array x.
Temporary objects normally live only until the end of the (full-)expression in which they are created, but in this case, because a reference is immediately bound to it, so-called temporary lifetime extension applies and the temporary int* object will live as long as the reference does (i.e. until the end of the reference's scope).
In other words it is (mostly) equivalent to
int* /*unnamed*/ = x;
int*&& y = /*unnamed*/;
where the comment /*unnamed*/ is supposed to represent the non-existing name of the temporary object.
When a reference's name is used in an expression (and this is completely independent of whether or not it is a lvalue or rvalue reference), it behaves exactly the same as if the object to which the reference is bound would have been named instead (but with the type of the reference with reference-qualifiers stripped which may e.g. differ by a const).
In other words int* z = y; behaves equivalently to int* z = /*unnamed*/;. So z is intialized from the temporary int* object, which has a pointer value pointing to the first element of the x array. For scalar types like int* initialization is simply copying the value, so z will also be initialized to point to the first element of x.
The whole thing is needlessly convoluted. It is exactly equivalent to int* z = x;. Using rvalue references usually only really makes sense as function parameters and some constructs where type deduction occurs. The important difference between lvalue and rvalue references is that they affect overload resolution differently and that they may be initialized with different value categories. There is also one special case in type deduction where rvalue references behave differently (as so-called forwarding references). Aside from that there is no difference between the different kinds of references.
what is going on in memory if we execute this?
That's mostly an implementation detail that shouldn't matter. You have an array that is stored somewhere depending on where you put these lines in your program. It is likely to be physically present somewhere in memory if the compiler doesn't figure it isn't needed. The reference y, the temporary int* which it points to and the z pointer may or may not actually be physically present in some memory, but the compiler is likely to just reduce all of them directly to the array. In particular, on the language level, references are not object and do not have storage (e.g. they don't have a memory size or location). If the compiler needs some memory to implement them (e.g. as a pointer), then that is purely an implementation detail of the compiler.

Function Return Mechanism:Temporary object, R-Value, L-Value

Analyzing the question on its low level, when a function returns a value, it is returned either in a cpu register, or in a space allocated on the stack previously by the caller.
At this point the calling function can take the value and copy it into its local variable.
int sum(int a,int b){return a + b;}
int main(){int risultato = sum(10,20);return 0;}
in this case the sum function returns the value in the EAX register. Then the main function copies the value from the eax register into a memory location on the stack.
This is what really happens.
Moving now to the abstraction of C ++, if I tried to do an operation like this:
sum (10.20) = 4;
it gives me an error.
This is because basically the function is not returning the memory location in which the value is contained, but the value itself.
Being therefore an r-value, this will not be assignable, since it is not possible to assign a value to another value.
The issue becomes completely different when the dereferencing operator * is used.
In this case, it will not be returned a value, but the memory location itself (l-value), which will therefore be assignable.
Is what I wrote correct?
Let's take now this second exemple.
class class1 {public: int i; int b; class1(int i,int b) { this->i = i;this->b = b; }};
class1 fun() { class1 c(10,5); return c; }
int main() {fun().i = 4; return 0;}
in this case the function returns an object.
If I try to execute an instruction like this:
fun (). i = 4; I always get an error.
I know that when the function is called a temporary object is created on the stack.
Returning the function an object , but not as a variable (l-value), but as a set of values, it will not be possible to assign one of these with the value 4.
The same problem also seems to exist with this statement here:
class1(10,20).i = 4;
In this case I am creating a temporary object, I don't understand why it doesn't give me the possibility to assign the object's variable i, why in this case is it always interpreted as an r-value and not as an l-value?
I know that what I am doing has no use in practice, but it remains a purely theoretical question, which I need to understand the syntax of language correctly.
Could you comment everything I have said so far, expressing your point of view, and trying to answer the final question?
Moving now to the abstraction of C ++, if I tried to do an operation like this: sum (10.20) = 4; it gives me an error. This is because basically the function is not returning the memory location in which the value is contained, but the value itself. Being therefore an r-value, this will not be assignable, since it is not possible to assign a value to another value. The issue becomes completely different when the dereferencing operator * is used. In this case, it will not be returned a value, but the memory location itself (l-value), which will therefore be assignable.
Is what I wrote correct?
Kind of. You say
This is because basically the function is not returning the memory location in which the value is contained
But that is not what happens. An object is returned, that object has a value. What makes it an rvalue is that the function "returns by value" (another name for makes a temporary object).
Being therefore an r-value, this will not be assignable, since it is not possible to assign a value to another value
This is only true for built in types. The assignment operator of built in types requires that the object being assigned to be an lvalue. If you have a user defined type (class, struct) then you can assign to an rvalue.
In this case I am creating a temporary object, I don't understand why it doesn't give me the possibility to assign the object's variable i, why in this case is it always interpreted as an r-value and not as an l-value?
The reason is that with operator . if the object you call it on is an rvalue, then the member you access is treated as an rvalue. Since i is a built in type, and an rvalue, you can't assign to it.
I know that what I am doing has no use in practice
This is the answer to your question:
Why in this case is it always interpreted as an r-value and not as an l-value?
It's harder to implement the compiler if it needs to make this an L-Value, and since it has no use, it's not worth the trouble.
There are some things that are just for the convenience of compiler writers.
#NathanOliver answered the C++ abstract machine part. I'll just add a note about how that maps to asm.
Then the main function copies the value from the eax register into a memory location on the stack.
Or not, if the optimizing compiler just keeps risultato in a register like EAX. Or optimizes it away completely because in this case it's unused.
In abstract C every object has a memory address (except for register int foo variables), but in practice unless you disable optimization variables only have addresses if the compiler runs out of registers.
The return-value object is in EAX.
Notice that mainstream C++ calling conventions only ever return trivially-copyable objects in registers. A non-trivial constructor or destructor will force even a struct of one member to be returned by hidden pointer, to make sure the constructor and destructor have a consistent this. (Calling convention rules can't depend on the content of the constructor and destructor functions, just whether either is defined at all.)

Is it legal to modify an object created with new through a const pointer?

So this answer made me think about the scenario where you assign the result of new to a pointer to a const. AFAIK, there's no reason you can't legally const_cast the constness away and actually modify the object in this situation:
struct X{int x;};
//....
const X* x = new X;
const_cast<X*>(x)->x = 0; // okay
But then I thought - what if you actually want new to create a const object. So I tried
struct X{};
//....
const X* x = new const X;
and it compiled!!!
Is this a GCC extension or is it standard behavior? I have never seen this in practice. If it's standard, I'll start using it whenever possible.
new obviously doesn't create a const object (I hope).
If you ask new to create a const object, you get a const object.
there's no reason you can't legally const_cast the constness away and actually modify the object.
There is. The reason is that the language specification calls that out explicitly as undefined behaviour. So, in a way, you can, but that means pretty much nothing.
I don't know what you expected from this, but if you thought the issue was one of allocating in readonly memory or not, that's far from the point. That doesn't matter. A compiler can assume such an object can't change and optimise accordingly and you end up with unexpected results.
const is part of the type. It doesn't matter whether you allocate your object with dynamic, static or automatic storage duration. It's still const. Casting away that constness and mutating the object would still be an undefined operation.
constness is an abstraction that the type system gives us to implement safety around non-mutable objects; it does so in large part to aid us in interaction with read-only memory, but that does not mean that its semantics are restricted to such memory. Indeed, C++ doesn't even know what is and isn't read-only memory.
As well as this being derivable from all the usual rules, with no exception [lol] made for dynamically-allocated objects, the standards mention this explicitly (albeit in a note):
[C++03: 5.3.4/1]: The new-expression attempts to create an object of the type-id (8.1) or new-type-id to which it is applied. The type of that object is the allocated type. This type shall be a complete object type, but not an abstract class type or array thereof (1.8, 3.9, 10.4). [Note: because references are not objects, references cannot be created by new-expressions. ] [Note: the type-id may be a cv-qualified type, in which case the object created by the new-expression has a cv-qualified type. ] [..]
[C++11: 5.3.4/1]: The new-expression attempts to create an object of the type-id (8.1) or new-type-id to which it is applied. The type of that object is the allocated type. This type shall be a complete object type, but not an abstract class type or array thereof (1.8, 3.9, 10.4). It is implementation-defined whether over-aligned types are supported (3.11). [ Note: because references are not objects, references cannot be created by new-expressions. —end note ] [ Note: the type-id may be a cv-qualified type, in which case the object created by the new-expression has a cv-qualified type. —end note ] [..]
There's also a usage example given in [C++11: 7.1.6.1/4].
Not sure what else you expected. I can't say I've ever done this myself, but I don't see any particular reason not to. There's probably some tech sociologist who can tell you statistics on how rarely we dynamically allocate something only to treat it as non-mutable.
My way of looking at this is:
X and const X and pointers to them are distinct types
there is an implicit conversion from X* to const X*, but not the other way around
therefore the following are legal and the x in each case has identical type and behaviour
const X* x = new X;
const X* x = new const X;
The only remaining question is whether a different allocator might be called in the second case (perhaps in read only memory). The answer is no, there is no such provision in the standard.

How can a reference require no storage?

From this question, and consequently, from the Standard (ISO C++-03):
It is unspecified whether or not a reference requires storage (3.7).
In some answers in that thread, it's said that references have, internally, the same structure of a pointer, thus, having the same size of it (32/64 bits).
What I'm struggling to grasp is: how would a reference come not to require storage?
Any sample code exemplifying this would be greatly appreciated.
Edit:
From #JohannesSchaub-litb comment, is there anything like, if I'm not using a const &, or if I'm using a const & with default value, it requires allocation? It seems to me, somehow, that there should be no allocations for references at all -- except, of course, when there are explicit allocations involved, like:
A& new_reference(*(new A())); // Only A() instance would be allocated,
// not the new_reference itself
Is there any case like this?
Take something simple:
int foo() {
int x = 5;
int& r = x;
r = 10;
return x;
}
The implementation may use a pointer to x behind the scenes to implement that reference, but there's no reason it has to. It could just as well translate the code to the equivalent form of:
int foo() {
int x = 10
return x;
}
Then no pointers are needed whatsoever. The compiler can just bake it right into the executable that r is the same as x, without storing and dereferencing a pointer that points at x.
The point is, whether the reference requires any storage is an implementation detail that you shouldn't need to care about.
I believe the key point to understanding is that reference types are not object types.
An object type is a (possibly cv-qualified) type that is not a function type, not a reference type, and not a
void type (§3.9[basic.types]/8)
Objects require storage ("An object is a region of storage." -- §1.8[intro.object]/1)
Moreover, C++ programs operate on objects: "The constructs in a C++ program create, destroy, refer to, access, and manipulate objects." -- same paragraph
So, when the compiler encounters a reference in the program, it is up to the compiler whether it has to synthesize an object (typically of a pointer type), and, therefore, use some storage, or find some other way to implement the desired semantics in terms of object model (which may involve no storage).

Example of code which incorrectly tries to re-seat a reference

As per my understanding, C++ does not allow you to re-seat a reference. In other words, you cannot change the object that a reference "refers" to. It's like a constant pointer in that regard (e.g. int* const a = 3;).
In some code I looked at today, I saw the following:
CMyObject& object = ObjectA().myObject();
// ...
object = ObjectB().myObject();
Immediately my alarm bells went off on the last line of code above. Wasn't the code trying to re-seat a reference? Yet the code compiled.
Then I realised that what the code was doing was simply invoking the assignment operator (i.e. operator=) to reassign ObjectA's internal object to ObjectB's internal object. The object reference still referred to ObjectA, it's just that the contents of ObjectA now matched that of ObjectB.
My understanding is that the compiler will always generate a default assignment operator if you don't provide one, which does a shallow copy (similar to the default copy constructor).
Since a reference is typed (just like the underlying object that it refers to), doesn't that mean that we will always invoke the assignment operator when attempting to re-seat a reference, thus preventing the compiler from complaining about this?
I've been racking my brains out trying to come up with an illegal line of code which will incorrectly try to re-seat a reference, to get the compiler to complain.
Can anyone point me to an example of such code?
You can't "reseat" a reference, because it's syntactically impossible. The reference variable you use which refers to the object uses the same semantics as if it was an object (non-reference) variable.
I've been racking my brains out trying to come up with an illegal line of code which will incorrectly try to re-seat a reference, to get the compiler to complain.
const int i = 42;
const int j = 1337;
const int& r = i;
r = j;
The uninitiated might expect the last line to re-seat r to j, but instead, the assignment to i fails.
You can't write portable C++ code to reseat a reference... the compiler tracks where the reference refers to and doesn't allow it to be changed. It's a kind of alias for whatever it refers to, and in some cases the reference value may be incorporated directly into the code at compile time. On some implementations where a particular reference happens to be stored in the form of a pointer, and happens to be looked up at run time, you may be able to use a reinterpret cast to overwrite it with a pointer to another object, but the behaviour is totally undefined and unreliable. For what little it's worth (nothing practically, but perhaps a smidge in assisting understanding of likely implementation), that might look something like:
struct X
{
Y& y_;
X(Y& y) : y_(y) { }
};
...
X x(y1);
*reinterpret_cast<Y**>(&x) = &y2;
My understanding is that the compiler
will always generate a default
assignment operator if you don't
provide one, which does a shallow copy
(similar to the default copy
constructor).
Since a reference is typed (just like
the underlying object that it refers
to), doesn't that mean that we will
always invoke the assignment operator
when attempting to re-seat a
reference, thus preventing the
compiler from complaining about this?
It's not quite like that. Implicit copy (assignment) performs memberwise copying (not necessarily shallow), and the compiler won't let bad things happen implicitly to reference members.
class X
{
int& ref;
public:
X(int& r): ref(r) {}
};
int main()
{
int i;
X a(i), b(i);
a = b;
}