I've been carefully studying C++ catogories recently. The difference between lvalue and rvalue seems to be clear, but I got confused when it comes to prvalue and xvalue.
Given the example below:
#include <iostream>
using std::cout;
using std::endl;
using std::move;
class Type {
public:
int value;
Type(const int &value=0) :value(value) {}
Type(const Type &type) :value(type.value){}
Type(Type &&type) noexcept :value(type.value) {}
Type &operator= (const Type &type) {
value = type.value;
return *this;
}
Type &operator=(Type &&type) noexcept{
value = type.value;
return *this;
}
};
Type foo1(const Type &value) {
return Type(value);
}
Type &&foo2(const Type &value) {
return Type(value);
}
int main() {
Type bar1(123);
cout << foo1(bar1).value << endl;
cout << foo2(bar1).value << endl;
Type bar2;
bar2 = foo1(bar1);
cout << bar2.value << endl;
bar2 = foo2(bar1);
cout << bar2.value << endl;
return 0;
}
Running the example, the console puts:
123
123
123
-858993460
Can anyone explan why it gives an unexpected value in the last output?
What feature does this example show about xvalue?
foo2 is returning reference bound to temporary which is destroyed immediately; it always returns a dangled reference.
a temporary bound to a return value of a function in a return statement is not extended: it is destroyed immediately at the end of the return expression. Such function always returns a dangling reference.
Dereference on the returned reference like foo2(bar1).value and bar2 = foo2(bar1); leads to UB; anything is possible.
On the other hand, foo1 doesn't have such issue. The return value is moved from the temporary object.
Here's a very simple explanation. Consider this textbook example from pre-C++0x times:
T& f()
{
T t;
return t;
}
// ...
T& val = f();
cout << val; // <--- SIGSEGV here
You are not surprised that this snippet crashed, right? You returned a reference (which is, under the hood, nothing more than a glorified pointer) to a local object which was destroyed before the function even returned, thus dooming val to become a dangling reference.
Now consider:
T&& f()
{
T t;
return static_cast<T&&>(t);
}
// ...
T&& val = f();
cout << val; // <--- SIGSEGV here
In terms of object lifetimes, this is exactly the same as before. Rvalue reference is not some brand new tool for magically moving objects around. It's the same old reference (read: glorified pointer). Nothing changed - you still returned an address of a destroyed object.
And I assume everyone is on board with that std::move is nothing more than a glorified static_cast<T&&>, so in case it's used the result is the same:
T&& f()
{
T t;
return std::move(t);
}
// ...
T&& val = f();
cout << val; // <--- SIGSEGV here
This example is 99.9% identical to the previous one.
(The 0.1% difference is, e.g., in that in the first two examples GCC knows that the code is screwed up and actually returns a null reference that's guaranteed to crash on first use; while in the last example, since std::move is a function, it can't be sure so it obediently returns the bad address).
Related
C++11 question
Trying to understand an issue I came across in our code. Not looking for the "right" way to do this, just want to know how this is supposed to work so I can figure out how to fix things in the future
I think function f1() is fine returning a reference to the temp implicitly constructed on the same line as the p1
But what about p2? The temp is constructed implicitly in the body of f2() when calling f1(), but f2() is returning the reference that is being returned by f1(). I thought the lifetime of the temp is extended to match the lifetime of the reference
asking because in one of our compilers p2 is garbage on the next time, but on the others it is not
struct PP
{
int i;
PP(int i_) : i(i_) {}
};
const PP &f1(const PP &p)
{
return p;
}
const PP &f2(int i)
{
return f1(i);
} // does the temp live here after return?
int main()
{
const PP &p1 = f1(1);
const PP &p2 = f2(2); // is p2 valid on the NEXT line
return 0;
}
I think this is undefined behaviour, and the fact that different compilers have different outputs would suggest that.
In f2, you're returning a reference constructed by f1, which itself has a reference to the local object i. At the end of the scope, i dissapears and invalidates the reference.
In fact clang-tidy detects this godbolt example
Also in reference initialization of cppreference it states, as an exception to lifetime extension:
a temporary bound to a return value of a function in a return
statement is not extended: it is destroyed immediately at the end of
the return expression. Such return statement always returns a dangling
reference.
In an article about reference initialization at cppreference.com (Lifetime of a temporary), it says:
a temporary bound to a return value of a function in a return statement is not extended: it is destroyed immediately at the end of the return expression. Such function always returns a dangling reference.
This excerpt addresses the exceptions of extending the lifetime of a temporary by binding a reference to it. What do they actually mean by that? I've thought about something like
#include <iostream>
int&& func()
{
return 42;
}
int main()
{
int&& foo = func();
std::cout << foo << std::endl;
return 0;
}
So foo should be referencing the temporary 42. According to the excerpt, this should be a dangling reference - but this prints 42 instead of some random value, so it works perfectly fine.
I'm sure I'm getting something wrong here, and would appreciate if somebody could resolve my confusion.
Your example is very good, but your compiler is not.
A temporary is often a literal value, a function return value, but also an object passed to a function using the syntax "class_name(constructor_arguments)". For example, before lambda expressions were introduced to C++, to sort things one would define some struct X with an overloaded operator() and then make a call like this:
std::sort(v.begin(), v.end(), X());
In this case you expect that the lifetime of the temporary constructed with X() will end on the semicolon that ends the instruction.
If you call a function that expects a const reference, say, void f(const int & n), with a temporery, e.g. f(2), the compiler creates a temporary int, initailses it with 2, and passes a reference to this temporary to the function. You expect this temporary to end its life with the semicolon in f(2);.
Now consider this:
int && ref = 2;
std::cout << ref;
This code is perfectly valid. Notice, however, that here the compiler also creates a temporary object of type int and initalises it with 2. This is this temporary that ref binds to. However, if the temporary's lifetime was limited to the instruction it is created within, and ended on the semicolon that marks the end of instruction, the next instruction would be a disaster, as cout would be using a dangling reference. Thus, references to temporaries like the one above would be rather impractical. This is what the "extension of the lifetime of a temporary" is needed for. I suspect that the compiler, upon seeing something like int && ref = 2 is allowed to transform it to something like this
int tmp = 2;
int && ref = std::move(tmp);
std::cout << ref; // equivalent to std::cout << tmp;
Without lifetime expansion, this could look rather like this:
{
int tmp = 2;
int && ref = std::move(tmp);
}
std::cout << ref; // what is ref?
Doing such a trick in a return statement would be pointless. There's no reasonable, safe way to extend the lifetime of any object local to a function.
BTW. Most modern compilers issue a warning and reduce your function
int&& func()
{
return 42;
}
to
int&& func()
{
return nullptr;
}
with an immediate segfault upon any attempt to dereference the return value.
I want to understand whether I can safely return an rvalue reference that is passed as an argument to a function and it doesn't get destroyed with the stack unwinding.
struct Struct { int m; };
Struct& f(Struct&& rvalue)
{
std::cout << &rvalue << '\n';
return rvalue;
}
void main()
{
Struct& lvalue1 = f(Struct{ 1 });
std::cout << &lvalue1 << '\n';
Struct& lvalue2 = f(Struct{ 2 });
std::cout << &lvalue2 << '\n';
std::cin.get();
}
Output:
00A3F844
00A3F844
00A3F838
00A3F838
This code produces different addresses for the rvalues. Does that mean that actual constructing of Struct objects happens before a function call and I can safely do this kind of things?
I can safely do this kind of things?
No. Struct{ 1 } and Struct{ 2 } construct temporary objects which get destroyed after the full expression. That means the reference lvalue1 and lvalue2 are always dangled. Dereference on them leads to undefined behavior.
All temporary objects are destroyed as the last step in evaluating the full-expression that (lexically) contains the point where they were created
Using the return value of operator* from a "dead" unique_ptr is bad.
The following code compiles but results of course in Undefined Behavior:
auto& ref = *std::make_unique<int>(7);
std::cout << ref << std::endl;
Why didn't the standard make the return type of operator* for an rvalue of std::unique_ptr an rvalue of the internal value, instead of an lvalue, like this:
// could have been done inside unique_ptr
T& operator*() & { return *ptr; }
T&& operator*() && { return std::move(*ptr); }
In which case this would work fine:
std::cout << *std::make_unique<int>(7) << std::endl;
But the code at the beginning would not compile (cannot bind an rvalue to an lvalue).
Side note: of course someone could still write bad code like the below, but it is saying "I'm UB" more verbosely, IMHO, thus less relevant for this discussion:
auto&& ref = *std::make_unique<int>(7);
std::cout << ref << std::endl;
Is there any good reason for operator* on an rvalue of std::unique_ptr to return an lvalue ref?
Your code, in terms of the value categories involved and the basic idea, is the equivalent of this:
auto &ref = *(new int(7));
new int(7) results in a pointer object which is a prvalue expression. Dereferencing that prvalue results in an lvalue expression.
Regardless of whether the pointer object is an rvalue or lvalue, applying * to a pointer will result in an lvalue. That shouldn't change just because the pointer is "smart".
Good question!
Without digging into the relevant papers and design discussions, I think there are a few points that are maybe the reasons for this design decision:
As #Nicol Bolas mentioned, this is how a built-in (raw) pointer would behave, so "do as int does" is applied here as "do as int* does".
This is similar to the fact that unique_ptr (and other library types) don't propagate constness (which in turn is why we are adding propagate_const).
What about the following code snippet? It doesn't compile with your suggested change, while it is a valid code that shouldn't be blocked.
class Base { virtual ~Base() = default; };
class Derived : public Base {};
void f(Base&) {}
int main()
{
f(*std::make_unique<Derived>());
}
(godbolt - it compiles if our operator* overloadings are commented out)
For your side note: I'm not sure auto&& says "I'm UB" any louder. On the contrary, some would argue that auto&& should be our default for many cases (e.g. range-based for loop; it was even suggested to be inserted automatically for "terse-notation range-based for loop" (which wasn't accepted, but still...)). Let's remember that rvalue-ref has similar effect as const &, extension of the lifetime of a temporary (within the known restrictions), so it doesn't necessarily look like a UB in general.
std::cout << *std::make_unique<int>(7) << std::endl; already works as the temporary dies at the end of the full expression.
T& operator*() & { return *ptr; }
T&& operator*() && { return std::move(*ptr); }
wouldn't avoid the dangling reference, (as for your example)
auto&& ref = *std::make_unique<int>(7); // or const auto&
std::cout << ref << std::endl;
but indeed, would avoid binding a temporary to a non-const lvalue reference.
Another safer alternative would be:
T& operator*() & { return *ptr; }
T operator*() && { return std::move(*ptr); }
to allow the lifetime extension, but that would do an extra move constructor not necessarily wanted in the general case.
I am kind confused about this case:
Declare a pointer:
int b =10;
int*a=&b;
Here & takes the address of b.
Consider another example:
/* Reference to the calling object can be returned */
Test& Test::func ()
{
// Some processing
return *this;
}
this should be a pointer, *this is the obeject pointed.
But here we are asking to assign *this to &Test.
What should we modify the code to let the function return the address. Should we still use Test& ?
In C++ there're two different syntax units:
&variable; // extracts address of variable
and
Type& ref = variable; // creates reference called ref to variable
Easy usage example:
int v = 5;
cout << v << endl; // prints 5
cout << &v << endl; // prints address of v
int* p;
p = &v; // stores address of v into p (p is a pointer to int)
int& r = v;
cout << r << endl; // prints 5
r = 6;
cout << r << endl; // prints 6
cout << v << endl; // prints 6 too because r is a reference to v
As for using references in functions, you should google "passing by reference in C++", there're many tutorials about it.
Firstly, this is a pointer. The * dereferences the pointer, meaning return *this; returns the object, not a pointer to it.
Secondly, Test& is returning a reference to a Test instance. In your case, it is a reference to the object. To make it return a pointer, it should be Test*.
It makes more sense if you read a pointer declaration from right to left.
Test* func(); //When I call func, and dereference the returned value, it will be a Test
But here we are asking to assign *this to &Test.
No... you're asking for the value/expression *this to be used to return a Test&, which is a reference to a Test object. What that does is return a reference to the object on which func() is invoked.
What should we modify the code to let the function return the address. Should we still use Test& ?
You should use Test* instead... pointers are addresses, and having changed the return type you could return this (which is a pointer), but not *this because *this is not a pointer.