Returning Eigen matrices and temporaries - c++

Consider the following function Foo:
// ...
Eigen::Vector3d Foo() {
Eigen::Vector3d res;
// ...
return res;
}
int main () {
Eigen::VectorXd foo = Foo(); // (1)
return 0;
}
The line (1) should not create any temporaries due to return value optimization. But consider the following case:
// ...
int main () {
Eigen::VectorXd foo;
// ...
foo.head<3>() = Foo(); // (2)
return 0;
}
Does (2) create any temporaries? More generally, does initializing any block of a matrix as in (2) create any temporaries? It would be great if this were not the case. Otherwise, I could redefine Foo as follows:
// ...
void AlternativeFoo(Eigen::Ref<Eigen::Vector3d> res) {
// Modify res
}
int main () {
Eigen::VectorXd foo;
// ...
AlternativeFoo(foo.head<3>()); // (3)
return 0;
}
Is (3) the only way to achieve the above without creating temporaries?

The line (1) should not create any temporaries due to return value optimization.
No, it must materialize a temporary for the return value of Foo.
The return type of Foo and the type of the variable foo do not match (up to cv-qualification): Vector3d vs VectorXd.
But this is a necessary condition for copy elision to be allowed.
If that is not the case, the constructor used will be neither a copy nor a move constructor in the first place.
So elision doesn't happen and in the constructor that is going to be called, the return value of Foo is bound to a reference argument, which will cause materialization of the temporary.
Does (2) create any temporaries? More generally, does initializing any block of a matrix as in (2) create any temporaries?
Yes, again temporaries for the Foo return values will be materialized, this time caused by binding to reference parameters in the operator=.
Is (3) the only way to achieve the above without creating temporaries?
I would assume so, but it probably doesn't matter anyway.
Assuming Foo can be inlined, the distinction is likely going to become meaningless and the compiler will figure out if the operations in Foo can be performed directly on the storage of foo or not.
If Foo cannot be inlined, then copying the three entries of the vector is unlikely to have significant relevance against the function call. Your alternative solution would in this case force extra indirection, which may be more costly than copying a few values as well.

Related

Is it possible to avoid a copy when returning an argument from a function?

Suppose I have value type with some in-place operation. For example, something like this:
using MyType = std::array<100, int>;
void Reverse(MyType &value) {
std::reverse(value.begin(), value.end());
}
(The type and operation can be more complicated, but the point is the operation works in-place and the type is trivially copyable and trivially destructible. Note that MyType is large enough to care about avoiding copies, but small enough that it probably doesn't make sense to allocate on the heap, and since it contains only primitives, it doesn't benefit from move semantics.)
I usually find it helpful to also define a helper function that doesn't change the value in-place, but returns a copy with the operation applied to it. Among other things, that enables code like this:
MyType value = Reversed(SomeFunction());
Considering that Reverse() operates in-place, it should be logically possible to calculate value without copying the result from SomeFunction(). How can I implement Reversed() to avoid unnecessary copies? I'm willing to define Reversed() as an inline function in a header if that's what's necessary to enable this optimization.
I can think of two ways to implement this:
inline MyType Reversed1(const MyType &value) {
MyType result = value;
Reverse(result);
return result;
}
This benefits from return-value optimization but only after the argument value has been copied to result.
inline MyType Reversed2(MyType value) {
Reverse(value);
return value;
}
This might require the caller to copy the argument, except if it's already an rvalue, but I don't think return-value optimization is enabled this way (or is it?) so there's a copy upon return.
Is there a way to implemented Reversed() that avoids copies, ideally in a way that it's guaranteed by recent C++ standards?
If you do want to reverse the string in-place so that the change to the string you send in as an argument is visible at the call site and you also want to return it by value, you have no option but to copy it. They are two separate instances.
One alternative: Return the input value by reference. It'll then reference the same object that you sent in to the function:
MyType& Reverse(MyType& value) { // doesn't work with r-values
std::reverse(std::begin(value), std::end(value));
return value;
}
MyType Reverse(MyType&& value) { // r-value, return a copy
std::reverse(std::begin(value), std::end(value));
return std::move(value); // moving doesn't really matter for ints
}
Another alternative: Create the object you return in-place. You'll then return a separate instance with RVO in effect. No moves or copies. It'll be a separate instance from the one you sent in to the function though.
MyType Reverse(const MyType& value) {
// Doesn't work with `std::array`s:
return {std::rbegin(value), std::rend(value)};
}
The second alternative would work if std::array could be constructed from iterators like most other containers, but they can't. One solution could be to create a helper to make sure RVO works:
using MyType = std::array<int, 26>;
namespace detail {
template<size_t... I>
constexpr MyType RevHelper(const MyType& value, std::index_sequence<I...>) {
// construct the array in reverse in-place:
return {value[sizeof...(I) - I - 1]...}; // RVO
}
} // namespace detail
constexpr MyType Reverse(const MyType& value) {
// get size() of array in a constexpr fashion:
constexpr size_t asize = std::tuple_size_v<MyType>;
// RVO:
return detail::RevHelper(value, std::make_index_sequence<asize>{});
}
Your last option is the way to go (except for the typo):
MyType Reversed2(MyType value)
{
Reverse(value);
return value;
}
[N]RVO doesn't apply to return result;, but at least it's implicitly moved, rather than copied.
You'll have either a copy + a move, or two moves, depending on the value category of the argument.
There is a trick. It is NOT pretty but it works.
Make Reversed accept not a T, but a function returning T. Call it like that:
MyType value = Reversed(SomeFunction); // note no `SomeFunction()`
Here is a full implementation of Reversed:
template <class Generator>
MyType Reversed(Generator&& g)
{
MyType t{g()};
reverse(t);
return t;
}
This produces no copies or moves. I checked.
If you feel particularly nasty, do this
#define Reversed(x) Reversed([](){return x;})
and go back to calling Reversed(SomeFunction()). Again, no copies or moves. Bonus points if you manage to squeeze it through a corporate code review.
You can use a helper method that turns your in-place operation into something that can work on Rvalues. When I tested this in GCC, it results in one move operation, but no copies. The pattern looks like this:
void Reversed(MyType & m);
MyType Reversed(MyType && m) {
Reversed(m);
return std::move(m);
}
Here is the full code I used to test whether this pattern results in copies or not:
#include <stdio.h>
#include <string.h>
#include <utility>
struct MyType {
int * contents;
MyType(int value0) {
contents = new int[42];
memset(contents, 0, sizeof(int) * 42);
contents[0] = value0;
printf("Created %p\n", this);
}
MyType(const MyType & other) {
contents = new int[42];
memcpy(contents, other.contents, sizeof(int) * 42);
printf("Copied from %p to %p\n", &other, this);
}
MyType(MyType && other) {
contents = other.contents;
other.contents = nullptr;
printf("Moved from %p to %p\n", &other, this);
}
~MyType() {
if (contents) { delete[] contents; }
}
};
void Reversed(MyType & m) {
for (int i = 0; i < 21; i++) {
std::swap(m.contents[i], m.contents[41 - i]);
}
}
MyType Reversed(MyType && m) {
Reversed(m);
return std::move(m);
}
MyType SomeFunction() {
return MyType(7);
}
int main() {
printf("In-place modification\n");
MyType x = SomeFunction();
Reversed(x);
printf("%d\n", x.contents[41]);
printf("RValue modification\n");
MyType y = Reversed(SomeFunction());
printf("%d\n", y.contents[41]);
}
I'm not sure if this lack of copies is guaranteed by the standard, but I would think so, because there are some objects that are not copyable.
Note: The original question was just about how to avoid copies, but I'm afraid the goalposts are changing and now we are trying to avoid both copies and moves. The Rvalue function I present does seem to perform one move operation. However, if we cannot eliminate the move operation, I would suggest that the OP redesign their class so that moves are cheap, or just give up on the idea of this shorter syntax.
When you write
MyType value = Reversed(SomeFunction());
I see 2 things happen: Reversed will do RVO so it directly writes to value and SomeFunction either gets coppied into the argument for Reversed or creates a temporary object and you pass a reference. No matter how you write it there will be at least 2 objects and you have to reverse from one to the other.
There is no way for the compiler to do what I would call an AVO, argument value optimization. You want the argument to the Reversed function to be stored in the return value of the function so you can do in-place operations. With that feature the compiler could do RVO-AVO-RVO and SomeFunction would create it's return value directly in the final value variable.
But you could do this I think:
MyType &&value = SomeFunctio();
reverse(value);
Looking at it another way: Say you do figure out a way for Reveresed to do in-place operations then in
MyType &&value = Reversed(SomeFunction());
the SomeFunction would create a temporary but then the compiler has to extend the lifetime of that temporary to the lifetime of value. This works in direct assignment but how is the compiler supposed to know that Reversed will just pass the temporary through?
From the answers and comments it looks like the consensus is that there is no way to achieve this in C++.
It makes sense that this is the general answer without the implementation of MyType Reversed(MyType) available, since the compiler would have no clue that the return value is the same as the argument, so it would necessarily allocate separate spaces for them.
But it looks like even with the implementation of Reversed() available, neither GCC nor Clang will optimize away the copy: https://godbolt.org/z/KW6Y3vsdf
So I think the short story is that what I was asking for isn't possible. If it's important to avoid the copy, the caller should explicitly write:
MyType value = SomeFunction();
Reverse(value);
// etc.

To move, or not to move from r-value ref-qualified method?

In the following C++11+ code which return statement construction should be preferred?
#include <utility>
struct Bar
{
};
struct Foo
{
Bar bar;
Bar get() &&
{
return std::move(bar); // 1
return bar; // 2
}
};
Well, since it's a r-value ref qualified member function, this is presumably about to expire. So it makes sense to move bar out, assuming Bar actually gains something from being moved.
Since bar is a member, and not a local object/function parameter, the usual criteria for copy elision in a return statement don't apply. It would always copy unless you explicitly std::move it.
So my answer is to go with option number one.
I prefer option 3:
Bar&& get() &&
// ^^
{
return std::move(bar);
}
and, while we're at it:
Bar& get() & { return bar; }
Bar const& get() const& { return bar; }
Bar const&& get() const&& { return std::move(bar); }
We're an rvalue, so it should be free to cannibilize our resources, so move-ing bar is correct. But just because we're open to moving bar doesn't mean we have to mandate such a move and incur extra operations, so we should just return an rvalue reference to it.
This is how the standard library do - e.g. std::optional<T>::value.
I would like to clarify my point (from comments). Even though moving result should in general be considerably more efficient than copying, it is not my primary concern here. The core issue arises from false assumption that by calling this method on r-value reference to Foo instance caller's intentions include creation of a new Bar value. For example:
Foo Produce_Foo(void);
// Alright, caller wanted to make a new `Bar` value, and by using `move`
// we've avoided a heavy copy operation.
auto bar{Produce_Foo().get()};
// Oops! No one asked us to make a useless temporary...
cout << Produce_Foo().get().value() << endl;
The solution would be to add a dedicated functions to be used just to take a peek at stored bar and to take control over content of stored bar object.
Bar const & get_bar() const noexcept
{
return bar;
}
// no particular need to this method to be r-value reference qualified
// because there is no direct correlation between Foo instance being moved / temp
// and intention to take control over content of stored bar object.
Bar give_bar() noexcept
{
return ::std::move(bar);
}
Now that user has a choice there will be no more problems:
// Alright, caller wanted to make a new `Bar` value, and by using `move`
// we've avoided a heavy copy operation.
// There is also no need to figure out whether Produce_Foo returned an rvalue or not.
auto bar{Produce_Foo().give_bar()};
// Alright, no extra temporaries.
cout << Produce_Foo().get_bar().value() << endl;
As for use cases for r-value reference qualified methods, I think they are mostly useful when dealing with temporaries of the same type as this object. e.g. string class implementing such concatenation operator can reduce amount of
reallocations, essentially performing like a dedicated string builder.

Creating an object, local variable vs rvalue reference

Is there any advantage to using an r value reference when you create an object, that would otherwise be in a normal local variable?
Foo&& cn = Foo();
cn.m_flag = 1;
bar.m_foo = std::move(cn);
//cn is not used again
Foo cn;
cn.m_flag = 1;
bar.m_foo = std::move(cn); //Is it ok to move a non rvalue reference?
//cn is not used again
In the first code snippet, it seems clear that there will not be any copies, but I'd guess that in the second the compile would optimize copies out?
Also in the first snippet, where is the object actually stored in memory (in the second it is stored in the stack frame of the enclosing function)?
Those code fragments are mostly equivalent. This:
Foo&& rf = Foo();
is binding a temporary to a reference, which extends the lifetime of the temporary to that of the reference. The Foo will only be destroyed when rf goes out of scope. Which is the same behavior you get with:
Foo f;
with the exception that in the latter example f is default-initialized but in the former example rf is value-initialized. For some types, the two are equivalent. For others, they are not. If you had instead written Foo f{}, then this difference goes away.
One remaining difference pertains to copy elision:
Foo give_a_foo_rv() {
Foo&& rf = Foo();
return rf;
}
Foo give_a_foo() {
Foo f{};
return f;
}
RVO is not allowed to be performed in the first example, because rf does not have the same type as the return type of give_a_foo_rv(). Moreover, rf won't even be automatically moved into the return type because it's not an object so it doesn't have automatic storage duration, so that's an extra copy (until C++20, in which it's an extra move):
Foo f = give_a_foo_rv(); // a copy happens here!
Foo g = give_a_foo(); // no move or copy
it seems clear that there will not be any copies
That depends entirely on what moving a Foo actually does. If Foo looks like:
struct Foo {
Foo() = default;
Foo(Foo const& ) = default;
Foo& operator=(Foo const& ) = default;
// some members
};
then moving a Foo still does copying.
And yes, it is perfectly okay to std::move(f) in the second example. You don't need an object of type rvalue reference to T to move from it. That would severely limit the usefulness of moving.

Is it good practice to bind shared pointers returned by functions to lvalue references to const?

Although it took me a while to get used to it, I now grew the habit of letting my functions take shared pointer parameters by lvalue-reference to const rather than by value (unless I need to modify the original arguments, of course, in which case I take them by lvalue-reference to non-const):
void foo(std::shared_ptr<widget> const& pWidget)
// ^^^^^^
{
// work with pWidget...
}
This has the advantage of avoiding an unnecessary copy of a shared pointer, which would mean thread-safely increasing the reference counting and potentially incurring in undesired overhead.
Now I've been wondering whether it is sane to adopt a somewhat symmetrical habit for retrieving shared pointers that are returned by value from functions, like at the end of the following code snippet:
struct X
{
// ...
std::shared_ptr<Widget> bar() const
{
// ...
return pWidget;
}
// ...
std::shared_ptr<Widget> pWidget;
};
// ...
// X x;
std::share_ptr<Widget> const& pWidget = x.bar();
// ^^^^^^
Are there any pitfalls with adopting such a coding habit? Is there any reason why I should prefer, in general, assigning a returned shared pointer to another shared pointer object rather than binding it to a reference?
This is just a remake of the old question of whether capturing a const reference to a temporary is more efficient than creating a copy. The simple answer is that it isn't. In the line:
// std::shared_ptr<Widget> bar();
std::shared_ptr<Widget> const & pWidget = bar();
The compiler needs to create a local unnamed variable (not temporary), initailize that with the call to bar() and then bind the reference to it:
std::shared_ptr<Widget> __tmp = bar();
std::shared_ptr<Widget> const & pWidget = __tmp;
In most cases it will avoid the creation of the reference and just alias the original object in the rest of the function, but at the end of the day whether the variable is called pWidget or __tmp and aliased won't give any advantage.
On the contrary, for the casual reader, it might look like bar() does not create an object but yield a reference to an already existing std::shared_ptr<Widget>, so the maintainer will have to seek out where bar() is defined to understand whether pWidget can be changed outside of the scope of this function.
Lifetime extension through binding to a const reference is a weird feature in the language that has very little practical use (namely when the reference is of a base and you don't quite care what the exact derived type returned by value is, i.e. ScopedGuard).
You may have the optimization backwards:
struct X
{
// ...
std::shared_ptr<Widget> const& bar() const
{
// ...
return pWidget;
}
// ...
std::shared_ptr<Widget> pWidget;
};
// ...
// X x;
std::share_ptr<Widget> pWidget = x.bar();
As bar is returning a member variable, it must take a copy of the shared_ptr in your version. If you return the member variable by reference the copy can be avoided.
This doesn't matter in both your original version and the version shown above, but would come up if you called:
x.bar()->baz()
In your version a new shared_ptr would be created, and then baz would be called.
In my version baz is called directly on the member copy of the shared_ptr, and the atomic reference increment/decrement is avoided.
Of course the cost of the shared_ptr copy constructor (atomic increment) is very small, and not even noticable in all but the most performance-sensetive applications. If you are writing a performance sensetive application than the better option would be to manage memory manually with a memory pool architecture and then to (carefully) use raw pointers instead.
Adding on top of what David Rodríguez - dribeas said namely, binding to a const reference doesn't save you from making the copy and the counter is incremented anyway, the following code illustrates this point:
#include <memory>
#include <cassert>
struct X {
std::shared_ptr<int> p;
X() : p{new int} {}
std::shared_ptr<int> bar() { return p; }
};
int main() {
X x;
assert(x.p.use_count() == 1);
std::shared_ptr<int> const & p = x.bar();
assert(x.p.use_count() == 2);
return 0;
}

Is return value always a temporary?

This page says a strange thing :-
The temporaries are created only if your program does not copy the return value to an object and example given is
UDT Func1(); // Declare a function that returns a user-defined type.
...
Func1(); // Call Func1, but discard return value.
// A temporary object is created to store the return
// value
but if i have done :-
UDT obj=Fuct1;
It appears to me that it will also create a temporary as follow:-
Func() constructs a local object. Next, this local object is copy-constructed on the caller's stack, making a temporary object that is used as the argument of obj's copy-constructor.
Am I wrong?
Is this has something to do with copy elision?
The page you cite it a description of the behavior of a specific
compiler. Formally: the return value is always a temporary. In
contexts where that temporary is used as the argument of a copy
constructor (the object is copied), the standard gives explicit
authorization for the compiler to elide the copy, “merging”
the temporary with the named variable it is initializing. All the
sentence you quote is saying is that this specific compiler always does
his optimization (as do most other compilers).
This page is Microsoft specific. It's true that the standard permits to do two, one or zero calls to the copy constructor during function return (this is called copy elision). In fact one call is always sufficient.
Suppose you write:
A f(int x) {
return A(x);
}
void g() {
A r = f(10);
}
The way MSVC implements this is:
void f_impl(A* r, int x) {
new((void*)r) A(x); // construct the return value into r
}
void g_impl() {
A r = __uninitialized__;
f_impl(&r, 10);
}
Here you see zero calls to the copy constructor and no temporaries.
If you call f like this:
void g() {
f(10);
}
Then the compiler still needs to construct the return value somewhere, so it creates a temporary:
void g_impl() {
A r = __uninitialized__;
f_impl(&r, 10);
r.~A(); // destruct temporary
}
When it calls the copy constructor? In the implementation of f when it can't know which f's local will be returned. E.g. this:
A f(int x)
{
A r1;
A r2;
// ...do something complicated modifying both r1 and r2...
if(x)
return r1;
// ...do something complicated...
return r2;
}
Is translated to something like this:
void f_impl(A* r, int x)
{
A r1;
A r2;
// ...do something complicated modifying both r1 and r2...
if(x)
{
new((void*)r) A(r1); // copy construct r1
return;
}
// ...do something complicated...
new((void*)r) A(r2); // copy construct r2
}
The return value is always a temporary. In the second case, a copy of that temporary (move in C++11) is made if copy elision cannot occur.