Say I have a std::vector that is declared in a loop's body and co_yielded:
some_generator<std::vector<int>> vector_sequence() {
while (condition()) {
std::vector<int> result = get_a_vector();
result.push_back(1);
co_yield std::move(result); // or just: co_yield result;
}
}
Quite obviously, result isn't going to be used again after being co_yielded (or I am horribly mistaken), so it would make sense to move it. I tried co_yielding a simple non-copyable type without std::move and it did not compile, so in generic code, one would have use std::move. Does the compiler not recognize this (compiler bug?) or is it intended by the language that co_yield always copies an lvalue, so I have to std::move? I know that returning an lvalue that is a local variable issues a move or some other sort of copy elision, and this does not seem so much different from it.
I have read C++: should I explicitly use std::move() in a return statement to force a move?, which is related to this question, but does not answer it, and considered co_return vs. co_yield when the right hand side is a temporary, which as far as I understand, is not related to this question.
The implicit move rule ([class.copy.elision]/3) applies to return and co_return statements and to throw expressions. It doesn't apply to co_yield.
The reason is that, in the contexts enumerated in [class.copy.elision]/3, the execution of the return or co_return statement or throw expression ensures that the implicitly movable entity's lifetime ends. For example,
auto foo() {
std::string s = ...;
if (bar()) {
return s;
}
// return something else
}
Here, even though there is code after the return statement, it's guaranteed that if the return statement executes, then any code further down that can see s will not execute. This makes it safe to implicitly move s.
In contrast, co_yield only suspends the coroutine and does not end it in the manner of co_return. Thus, in general, after co_yield result; is evaluated, the coroutine might later resume and use the very same result variable again. This means that in general, it's not safe to implicitly transform the copy into a move; therefore, the standard does not prescribe such behaviour. If you want a move, write std::move.
If the language were to allow implicit move in your example, it would have to have specific rules to ensure that, although the variable could be used again after co_yield, it is in fact not. In your case, it might indeed be that the loop will immediately end and thus the result variable's lifetime will end before its value can be observed again, but in general you would have to specify a set of conditions under which this can be guaranteed to be the case. Then, you could propose that an implicit move occur only under those conditions.
Related
OK, I have some code that seems to work but I'm not sure it will always work. I'm moving a unique_ptr into a stl map using one of the members of the class as the map key, but I'm not sure whether the move might invalidate the pointer in some situations.
The code is along these lines:
struct a
{
std::string s;
};
std::map<std::string, std::unique_ptr<a>> m;
std::unique_ptr<a> p = std::make_unique<a>();
// some code that does stuff
m[p->s] = std::move(p);
So this currently seems works but it seems to me it might be possible for p to become invalid before the string is used as the map key, and that would lead to a memory exception. Obviously I could create a temporary string before the move, or I could assign via an iterator, but I'd prefer not to if it isn't necessary.
This code has well-defined behaviour.
In C++17, std::move(p) will be evaluated before m[p->s]. Before C++17, std::move(p) could be evaluated either before or after m[p->s]. However, this doesn't matter because std::move(p) does not modify p. It is only the assignment that actually causes p to be moved-from.
The assignment operator that is called has the signature
unique_ptr& operator=(unique_ptr&& other);
and is called as if by
m[p->s].operator=(std::move(p));
This means that the modification of p is guaranteed to not take place until the body of operator= is entered (the initialization of the other parameter is merely a reference binding). And certainly the body of operator= cannot be entered until the object expression m[p->s] is evaluated.
So your code is well-defined in all versions of C++.
The code is fine. In C++ 17, we were given strong guarantees on the sequencing, which makes this code 100% OK.
Prior to C++17 the standard has
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
But that still means the code is okay. We don't know which of m[p->s] and std::move(p) happens first, but since move does't actually do anything to p, p->s will be valid and be resolved before p is moved into m[p->s]
I have a function like this, do i have to explicitly use move here or it is implicit?
std::vector<int> makeVector();
std::vector<int> makeVector2();
std::optional<std::vector<int>> getVectOr(int i) {
if(i==1) {
std::vector<int> v = makeVector();
return std::move(v);
}
else if(i==2) {
std::vector<int> v2 = makeVector2();
return std::move(v2);
}
return std::nullopt;
}
It doesn't matter whether you use std::move or not. No return value optimization will take place here. There are several requirements for RVO to take place.
One of the requirements for return value optimization is that the value being returned must be the same type as what the function returns.
std::optional<std::vector<int>> getVectOr(int i)
Your function returns std::optional<std::vector<int>>, so only a copy of a temporary of the same type will get elided. In the two return statements in question here, both temporaries are std::vector<int>s which are, of course, not the same type, so RVO ain't happening.
No matter what happens, you're returning std::optional<std::vector<int>>. That's an absolute requirement here. No exceptions. But your adventure to return something from this function always starts with std::vector<int>. No matter what you try, you can't turn it into a completely different type. Something will have to get constructed somewhere along the way. No return value optimization.
But having said that: there are also move semantics that come here into play. If the stars get fortunately aligned for you (and this is very likely) move semantics will allow for everything to happen without copying the contents of the large vector around. So, although no return value optimization takes place, you may win the lottery and have everything to happen without shuffling the actual contents of the vector across all your RAM. You can use your debugger, yourself, to confirm or deny whether you've won the lottery on that account.
You may also have a possibility of the other type of RVO, namely returning a non-volatile auto-scoped object from the function:
std::optional<std::vector<int>> getVectOr(int i) {
std::optional<std::vector<int>> ret;
// Some code
return ret;
}
It's also possible for return value optimization to take place here as well, it is optional but not mandatory.
In addition to what has already been said:
Using std::move in the return statement prohibits return value optimization. Named return value optimization is only allowed if the return statement's operand is the name of an automatic non-volatile storage variable declared in the function body and if its type equals (up to cv qualification) the return type.
std::move(v2) does not qualify for this. It does not simply name a variable.
Named return value optimization is never mandatory either. It is optional and up to the compiler whether it will perform it (even in C++17 which made some copy elision mandatory).
However, if return value optimization is not done, then generally the return value will be moved automatically. return statements have special behavior and if the operand directly names a variable with similar conditions as above, then overload resolution will be done as if the return value initializer was an rvalue expression (even if it isn't), so that move constructors will be considered. This automatic move is done whether or not the type of the variable referred to in the return statement is the same as the return type, so it applies to your example as well.
There is no need to use std::move explicitly and it is a pessimization in some cases (although not yours specifically) as explained above. So just use:
std::optional<std::vector<int>> getVectOr(int i) {
if(i==1) {
std::vector<int> v = makeVector();
return v;
}
else if(i==2) {
std::vector<int> v2 = makeVector2();
return v2;
}
return std::nullopt;
}
Consider the following piece of code:
std::vector<int> Foo() {
std::vector<int> v = Bar();
return v;
}
return v is O(1), since NRVO will omit the copy, constructing v directly in the storage where the function's return value would otherwise be moved or copied to. Now consider the functionally analogous code:
void Foo(std::vector<int> * to_be_filled) {
std::vector<int> v = Bar();
*to_be_filled = v;
}
A similar argument could be made here, as *to_be_filled = v could conceivably be compiled to an O(1) move-assign, since it's a local variable that's going out of scope (it should be easy enough for the compiler to verify that v has no external references in this case, and thus promote it to an rvalue on its last use). Is this the case? Is there a subtle reason why not?
Furthermore, it feels like this pattern can be extended to any context where an lvalue goes out of scope:
void Foo(std::vector<int> * to_be_filled) {
if (Baz()) {
std::vector<int> v = Bar();
*to_be_filled = v;
}
...
}
Do / can / is it useful / reasonable to expect compilers to find patterns such as the *to_be_filled = v and then automatically optimize them to assume rvalue semantics?
Edit:
g++ 7.3.0 does not perform any such optimizations in -O3 mode.
The compiler is not permitted to arbitrarily decide to transform an lvalue name into an rvalue to be moved from. It can only do so where the C++ standard permits it to do so. Such as in a return statement (and only when its return <identifier>;).
*to_be_filled = v; will always perform a copy. Even if it's the last statement that can access v, it is always a copy. Compilers aren't allowed to change that.
My understanding is that return v is O(1), since NRVO will (in effect) make v into an rvalue, which then makes use of std::vector's move-constructor.
That's not how it works. NRVO would eliminate the move/copy entirely. But the ability for return <identifier>; to be an rvalue is not an "optimization". It's actually a requirement that compilers treat them as rvalues.
Compilers have a choice about copy elision. Compilers don't have a choice about what return <identifier>; does. So the above will either not move at all (if NRVO happens) or will move the object.
Is there a subtle reason why not?
One reason this isn't allowed is because the location of a statement should not arbitrarily change what that statement is doing. See, return <identifier>; will always move from the identifier (if it's a local variable). It doesn't matter where it is in the function. By virtue of being a return statement, we know that if the return is executed, nothing after it will be executed.
That's not the case for arbitrary statements. The behavior of the expression *to_be_filled = v; should not change based on where it happens to be in code. You shouldn't be able to turn a move into a copy just because you add another line to the function.
Another reason is that arbitrary statements can get really complicated really quickly. return <identifier>; is very simple; it copies/moves the identifier to the return value and returns.
By contrast, what happens if you have a reference to v, and that gets used by to_be_filled somehow. Sure that can't happen in your case, but what about other, more complex cases? The last expression could conceivably read from a reference to a moved-from object.
It's a lot harder to do that in return <identifier>; cases.
I have the following function:
void read_int(std::vector<int> &myVector)
Which allows me to fill myVector through it reference. It is used like this:
std::vector<int> myVector;
read_int(myVector);
I want to refactor a bit the code (keeping the original function) to in the end have this:
auto myVector = read_int(); // auto is std::vector<int>
What would be the best intermediate function to achieve this?
It seems to me that the following straight-forward answer is suboptimal:
std::vector<int> read_int() {
std::vector<int> myVector_temp;
read_int(myVector_temp);
return myVector_temp;
}
The obvious answer is correct, and basically optimal.
void do_stufF(std::vector<int>& on_this); // (1)
std::vector<int> do_stuff_better() { // (2)
std::vector<int> myVector_temp; // (3)
do_stuff(myVector_temp); // (4)
return myVector_temp; // (5)
}
At (3) we create a named return value in automatic storage (on the stack).
At (5) we only ever return the named return value from the function, and we never return anything else but that named return value anywhere else in the function.
Because of (3) and (5), the compiler is allowed to (and most likely will) elide the existence of the myVector_temp object. It will directly construct the return value of the function, and call it myVector_temp. It still needs there to be an existing move or copy constructor, but it does not call it.
On the other end, when calling do_stuff_better, some compilers can also elide the assignment at call:
std::vector<int> bob = do_stuff_better(); // (6)
The compiler is allowed to effectively pass a "pointer to bob" and tell do_stuff_better() to construct its return value in bob's location, eliding this copy construction as well (well, it can arrange how the call occurs such that the location that do_stuff_better() is asked to construct its return value in is the same as the location of bob).
And in C++11, even if the requirements for both elisions are not met, or the compiler chooses not to use them, in both cases a move must be done instead of a copy.
At line (5) we are returning a locally declared automatic storage duration variable in a plain and simple return statement. This makes the return an implicit move if not elided.
At line (6), the function returns an unnamed object, which is an rvalue. When bob is constructed from it, it move-constructs.
moveing a std::vector consists of copying the value of ~3 pointers, and then zeroing the source, regardless of how big the vector is. No elements need be copied or moved.
Both of the above elisions, where we remove the named local variable within do_stuff_better(), and we remove the return value of do_stuff_better() and instead directly construct bob, are somewhat fragile. Learning the rules under which your compiler is allowed to do those elisions, and also the situations where your compiler actually does the elisions, is worthwhile.
As an example of how it is fragile, if you had a branch where you did a return std::vector<int>() in your do_stuff_better() after checking an error state, the in-function elision would probably be blocked.
Even if elision is blocked or your compiler doesn't implement it for a case, the fact that the container is move'd means that the run time costs are going to be minimal.
I think, you have to read more about move semantics (link to Google query, there are a lot of papers on this - just choose one).
In short, in C++ all STL containers are written in such way, that returning them from function will cause their contents to be moved from the returned value (so called right-hand reference) to the variable you are assigning it to. In effect you'll only copy a few fields of the std::vector instead of its data. That's a lot faster than copying its contents.
I'm confused as to what is going on in the following code snippet. Is move really necessary here? What would be the most optimal + safe way of returning the temporary set?
set<string> getWords()
{
set<string> words;
for (auto iter = wordIndex.begin(); iter != wordIndex.end(); ++iter)
{
words.insert(iter->first);
}
return move(words);
}
My calling code simply does set<string> words = foo.getWords()
First off, the set is not temporary, but local.
Second, the correct way to return it is via return words;.
Not only is this the only way you allow for return-value optimization, but moreover, the local variable will also bind to the move constructor of the returned object in the (unusual) case where the copy is not elided altogether. So it's a true triple-win scenario.
There's no need to use move here. Simply return "words". It will participate in the so-called "return value optimization".
Section 12.8 in the C++11 standard requires the move constructor to be called (if it exists) in the case where a local variable is returned. In essence, the compiler will take care of calling std::move for you.
No, the explicit move is not necessarily the most optimal way to move the set. Since the set is being returned by value, the compiler may perform named return value optimization on the set, meaning it may elide the copy and directly construct the set in-place at the call-site where the return value is to be stored. Explicit moving will inhibit this.