Prevent unnecessary copies of C++ functor objects

Prevent unnecessary copies of C++ functor objects - c++

I have a class which accumulates information about a set of objects, and can act as either a functor or an output iterator. This allows me to do things like:
std::vector<Foo> v;
Foo const x = std::for_each(v.begin(), v.end(), Joiner<Foo>());
and
Foo const x = std::copy(v.begin(), v.end(), Joiner<Foo>());
Now, in theory, the compiler should be able to use the copy elision and return-value optimizations so that only a single Joiner object needs to be created. In practice, however, the function makes a copy on which to operate and then copies that back to the result, even in fully-optimized builds.
If I create the functor as an lvalue, the compiler creates two extra copies instead of one:
Joiner<Foo> joiner;
Foo const x = std::copy(v.begin(), v.end(), joiner);
If I awkwardly force the template type to a reference it passes in a reference, but then makes a copy of it anyway and returns a dangling reference to the (now-destroyed) temporary copy:
x = std::copy<Container::const_iterator, Joiner<Foo>&>(...));
I can make the copies cheap by using a reference to the state rather than the state itself in the functor in the style of std::inserter, leading to something like this:
Foo output;
std::copy(v.begin(), v.end(), Joiner<Foo>(output));
But this makes it impossible to use the "functional" style of immutable objects, and just generally isn't as nice.
Is there some way to encourage the compiler to elide the temporary copies, or make it pass a reference all the way through and return that same reference?

You have stumbled upon an often complained about behavior with <algorithm>. There are no restrictions on what they can do with the functor, so the answer to your question is no: there is no way to encourage the compiler to elide the copies. It's not (always) the compiler, it's the library implementation. They just like to pass around functors by value (think of std::sort doing a qsort, passing in the functor by value to recursive calls, etc).
You have also stumbled upon the exact solution everyone uses: have a functor keep a reference to the state, so all copies refer to the same state when this is desired.
I found this ironic:
But this makes it impossible to use the "functional" style of immutable objects, and just generally isn't as nice.
...since this whole question is predicated on you having a complicated stateful functor, where creating copies is problematic. If you were using "functional" style immutable objects this would be a non-issue - the extra copies wouldn't be a problem, would they?

If you have a recent compiler (At least Visual Studio 2008 SP1 or GCC 4.4 I think) you can use std::ref/std::cref
#include <string>
#include <vector>
#include <functional> // for std::cref
#include <algorithm>
#include <iostream>
template <typename T>
class SuperHeavyFunctor
{
std::vector<char> v500mo;
//ban copy
SuperHeavyFunctor(const SuperHeavyFunctor&);
SuperHeavyFunctor& operator=(const SuperHeavyFunctor&);
public:
SuperHeavyFunctor():v500mo(500*1024*1024){}
void operator()(const T& t) const { std::cout << t << std::endl; }
};
int main()
{
std::vector<std::string> v; v.push_back("Hello"); v.push_back("world");
std::for_each(v.begin(), v.end(), std::cref(SuperHeavyFunctor<std::string>()));
return 0;
}
Edit : Actually, the MSVC10's implementation of reference_wrapper don't seem to known how to deduce the return type of function object operator(). I had to derive SuperHeavyFunctor from std::unary_function<T, void> to make it work.

Just a quick note, for_each, accumulate, transform (2nd form), provide no order guarantee when traversing the provided range.
This makes sense for implementers to provide mulit-threaded/concurrent versions of these functions.
Hence it is reasonable that the algorithm be able to provide an equivalent instance (a new copy) of the functor passed in.
Be wary when making stateful functors.

RVO is just that -- return value optimization. Most compilers, today, have this turned-on by default. However, argument passing is not returning a value. You possibly cannot expect one optimization to fit in everywhere.
Refer to conditions for copy elision is defined clearly in 12.8, para 15, item 3.
when a temporary class object that has
not been bound to a reference (12.2)
would be copied to a class object with
the same cv-unqualified type, the copy
operation can be omitted by
constructing the temporary object
directly into the target of the
omitted copy
[emphasis mine]
The LHS Foo is const qualified, the temporary is not. IMHO, this precludes the possibility of copy-elision.

For a solution that will work with pre-c++11 code, you may consider using boost::function along with boost::ref(as boost::reference_wrapper alone doesn't has an overloaded operator(), unlike std::reference_wrapper which indeed does). From this page http://www.boost.org/doc/libs/1_55_0/doc/html/function/tutorial.html#idp95780904, you can double wrap your functor inside a boost::ref then a boost::function object. I tried that solution and it worked flawlessly.
For c++11, you can just go with std::ref and it'll do the job.

Related

lambda by-value capture in class member function shows strange behavior [duplicate]

I have a class which accumulates information about a set of objects, and can act as either a functor or an output iterator. This allows me to do things like:
std::vector<Foo> v;
Foo const x = std::for_each(v.begin(), v.end(), Joiner<Foo>());
and
Foo const x = std::copy(v.begin(), v.end(), Joiner<Foo>());
Now, in theory, the compiler should be able to use the copy elision and return-value optimizations so that only a single Joiner object needs to be created. In practice, however, the function makes a copy on which to operate and then copies that back to the result, even in fully-optimized builds.
If I create the functor as an lvalue, the compiler creates two extra copies instead of one:
Joiner<Foo> joiner;
Foo const x = std::copy(v.begin(), v.end(), joiner);
If I awkwardly force the template type to a reference it passes in a reference, but then makes a copy of it anyway and returns a dangling reference to the (now-destroyed) temporary copy:
x = std::copy<Container::const_iterator, Joiner<Foo>&>(...));
I can make the copies cheap by using a reference to the state rather than the state itself in the functor in the style of std::inserter, leading to something like this:
Foo output;
std::copy(v.begin(), v.end(), Joiner<Foo>(output));
But this makes it impossible to use the "functional" style of immutable objects, and just generally isn't as nice.
Is there some way to encourage the compiler to elide the temporary copies, or make it pass a reference all the way through and return that same reference?

You have stumbled upon an often complained about behavior with <algorithm>. There are no restrictions on what they can do with the functor, so the answer to your question is no: there is no way to encourage the compiler to elide the copies. It's not (always) the compiler, it's the library implementation. They just like to pass around functors by value (think of std::sort doing a qsort, passing in the functor by value to recursive calls, etc).
You have also stumbled upon the exact solution everyone uses: have a functor keep a reference to the state, so all copies refer to the same state when this is desired.
I found this ironic:
But this makes it impossible to use the "functional" style of immutable objects, and just generally isn't as nice.
...since this whole question is predicated on you having a complicated stateful functor, where creating copies is problematic. If you were using "functional" style immutable objects this would be a non-issue - the extra copies wouldn't be a problem, would they?

If you have a recent compiler (At least Visual Studio 2008 SP1 or GCC 4.4 I think) you can use std::ref/std::cref
#include <string>
#include <vector>
#include <functional> // for std::cref
#include <algorithm>
#include <iostream>
template <typename T>
class SuperHeavyFunctor
{
std::vector<char> v500mo;
//ban copy
SuperHeavyFunctor(const SuperHeavyFunctor&);
SuperHeavyFunctor& operator=(const SuperHeavyFunctor&);
public:
SuperHeavyFunctor():v500mo(500*1024*1024){}
void operator()(const T& t) const { std::cout << t << std::endl; }
};
int main()
{
std::vector<std::string> v; v.push_back("Hello"); v.push_back("world");
std::for_each(v.begin(), v.end(), std::cref(SuperHeavyFunctor<std::string>()));
return 0;
}
Edit : Actually, the MSVC10's implementation of reference_wrapper don't seem to known how to deduce the return type of function object operator(). I had to derive SuperHeavyFunctor from std::unary_function<T, void> to make it work.

Just a quick note, for_each, accumulate, transform (2nd form), provide no order guarantee when traversing the provided range.
This makes sense for implementers to provide mulit-threaded/concurrent versions of these functions.
Hence it is reasonable that the algorithm be able to provide an equivalent instance (a new copy) of the functor passed in.
Be wary when making stateful functors.

RVO is just that -- return value optimization. Most compilers, today, have this turned-on by default. However, argument passing is not returning a value. You possibly cannot expect one optimization to fit in everywhere.
Refer to conditions for copy elision is defined clearly in 12.8, para 15, item 3.
when a temporary class object that has
not been bound to a reference (12.2)
would be copied to a class object with
the same cv-unqualified type, the copy
operation can be omitted by
constructing the temporary object
directly into the target of the
omitted copy
[emphasis mine]
The LHS Foo is const qualified, the temporary is not. IMHO, this precludes the possibility of copy-elision.

For a solution that will work with pre-c++11 code, you may consider using boost::function along with boost::ref(as boost::reference_wrapper alone doesn't has an overloaded operator(), unlike std::reference_wrapper which indeed does). From this page http://www.boost.org/doc/libs/1_55_0/doc/html/function/tutorial.html#idp95780904, you can double wrap your functor inside a boost::ref then a boost::function object. I tried that solution and it worked flawlessly.
For c++11, you can just go with std::ref and it'll do the job.

Is it undesirable to defensively apply std::move to trivially-copyable types?

Imagine we have a trivially-copyable type:
struct Trivial
{
float A{};
int B{};
}
which gets constructed and stored in an std::vector:
class ClientCode
{
std::vector<Trivial> storage{};
...
void some_function()
{
...
Trivial t{};
fill_trivial_from_some_api(t, other_args);
storage.push_back(std::move(t)); // Redundant std::move.
...
}
}
Normally, this is a pointless operation, as the object will be copied anyway.
However, an advantage of keeping the std::move call is that if the Trivial type would be changed to no longer be trivially-copyable, the client code will not silently perform an extra copy operation, but a more appropriate move. (The situation is quite possible in my scenario, where the trivial type is used for managing external resources.)
So my question is whether there any technical downsides to applying the redundant std::move?

However, an advantage of keeping the std::move call is that if the Trivial type would be changed to no longer be trivially-copyable, the client code will not silently perform an extra copy operation, but a more appropriate move.
This is correct and something you should think about.
So my question is whether there any technical downsides to applying the redundant std::move?
Depends on where the moved object is being consumed. In the case of push_back, everything is fine, as push_back has both const T& and T&& overloads that behave intuitively.
Imagine another function that had a T&& overload that has completely different behavior from const T&: the semantics of your code will change with std::move.

Why do we copy then move?

I saw code somewhere in which someone decided to copy an object and subsequently move it to a data member of a class. This left me in confusion in that I thought the whole point of moving was to avoid copying. Here is the example:
struct S
{
S(std::string str) : data(std::move(str))
{}
};
Here are my questions:
Why aren't we taking an rvalue-reference to str?
Won't a copy be expensive, especially given something like std::string?
What would be the reason for the author to decide to make a copy then a move?
When should I do this myself?

Before I answer your questions, one thing you seem to be getting wrong: taking by value in C++11 does not always mean copying. If an rvalue is passed, that will be moved (provided a viable move constructor exists) rather than being copied. And std::string does have a move constructor.
Unlike in C++03, in C++11 it is often idiomatic to take parameters by value, for the reasons I am going to explain below. Also see this Q&A on StackOverflow for a more general set of guidelines on how to accept parameters.
Why aren't we taking an rvalue-reference to str?
Because that would make it impossible to pass lvalues, such as in:
std::string s = "Hello";
S obj(s); // s is an lvalue, this won't compile!
If S only had a constructor that accepts rvalues, the above would not compile.
Won't a copy be expensive, especially given something like std::string?
If you pass an rvalue, that will be moved into str, and that will eventually be moved into data. No copying will be performed. If you pass an lvalue, on the other hand, that lvalue will be copied into str, and then moved into data.
So to sum it up, two moves for rvalues, one copy and one move for lvalues.
What would be the reason for the author to decide to make a copy then a move?
First of all, as I mentioned above, the first one is not always a copy; and this said, the answer is: "Because it is efficient (moves of std::string objects are cheap) and simple".
Under the assumption that moves are cheap (ignoring SSO here), they can be practically disregarded when considering the overall efficiency of this design. If we do so, we have one copy for lvalues (as we would have if we accepted an lvalue reference to const) and no copies for rvalues (while we would still have a copy if we accepted an lvalue reference to const).
This means that taking by value is as good as taking by lvalue reference to const when lvalues are provided, and better when rvalues are provided.
P.S.: To provide some context, I believe this is the Q&A the OP is referring to.

To understand why this is a good pattern, we should examine the alternatives, both in C++03 and in C++11.
We have the C++03 method of taking a std::string const&:
struct S
{
std::string data;
S(std::string const& str) : data(str)
{}
};
in this case, there will always be a single copy performed. If you construct from a raw C string, a std::string will be constructed, then copied again: two allocations.
There is the C++03 method of taking a reference to a std::string, then swapping it into a local std::string:
struct S
{
std::string data;
S(std::string& str)
{
std::swap(data, str);
}
};
that is the C++03 version of "move semantics", and swap can often be optimized to be very cheap to do (much like a move). It also should be analyzed in context:
S tmp("foo"); // illegal
std::string s("foo");
S tmp2(s); // legal
and forces you to form a non-temporary std::string, then discard it. (A temporary std::string cannot bind to a non-const reference). Only one allocation is done, however. The C++11 version would take a && and require you to call it with std::move, or with a temporary: this requires that the caller explicitly creates a copy outside of the call, and move that copy into the function or constructor.
struct S
{
std::string data;
S(std::string&& str): data(std::move(str))
{}
};
Use:
S tmp("foo"); // legal
std::string s("foo");
S tmp2(std::move(s)); // legal
Next, we can do the full C++11 version, that supports both copy and move:
struct S
{
std::string data;
S(std::string const& str) : data(str) {} // lvalue const, copy
S(std::string && str) : data(std::move(str)) {} // rvalue, move
};
We can then examine how this is used:
S tmp( "foo" ); // a temporary `std::string` is created, then moved into tmp.data
std::string bar("bar"); // bar is created
S tmp2( bar ); // bar is copied into tmp.data
std::string bar2("bar2"); // bar2 is created
S tmp3( std::move(bar2) ); // bar2 is moved into tmp.data
It is pretty clear that this 2 overload technique is at least as efficient, if not more so, than the above two C++03 styles. I'll dub this 2-overload version the "most optimal" version.
Now, we'll examine the take-by-copy version:
struct S2 {
std::string data;
S2( std::string arg ):data(std::move(x)) {}
};
in each of those scenarios:
S2 tmp( "foo" ); // a temporary `std::string` is created, moved into arg, then moved into S2::data
std::string bar("bar"); // bar is created
S2 tmp2( bar ); // bar is copied into arg, then moved into S2::data
std::string bar2("bar2"); // bar2 is created
S2 tmp3( std::move(bar2) ); // bar2 is moved into arg, then moved into S2::data
If you compare this side-by-side with the "most optimal" version, we do exactly one additional move! Not once do we do an extra copy.
So if we assume that move is cheap, this version gets us nearly the same performance as the most-optimal version, but 2 times less code.
And if you are taking say 2 to 10 arguments, the reduction in code is exponential -- 2x times less with 1 argument, 4x with 2, 8x with 3, 16x with 4, 1024x with 10 arguments.
Now, we can get around this via perfect forwarding and SFINAE, allowing you to write a single constructor or function template that takes 10 arguments, does SFINAE to ensure that the arguments are of appropriate types, and then moves-or-copies them into the local state as required. While this prevents the thousand fold increase in program size problem, there can still be a whole pile of functions generated from this template. (template function instantiations generate functions)
And lots of generated functions means larger executable code size, which can itself reduce performance.
For the cost of a few moves, we get shorter code and nearly the same performance, and often easier to understand code.
Now, this only works because we know, when the function (in this case, a constructor) is called, that we will be wanting a local copy of that argument. The idea is that if we know that we are going to be making a copy, we should let the caller know that we are making a copy by putting it in our argument list. They can then optimize around the fact that they are going to give us a copy (by moving into our argument, for example).
Another advantage of the 'take by value" technique is that often move constructors are noexcept. That means the functions that take by-value and move out of their argument can often be noexcept, moving any throws out of their body and into the calling scope (who can avoid it via direct construction sometimes, or construct the items and move into the argument, to control where throwing happens). Making methods nothrow is often worth it.

This is probably intentional and is similar to the copy and swap idiom. Basically since the string is copied before the constructor, the constructor itself is exception safe as it only swaps (moves) the temporary string str.

You don't want to repeat yourself by writing a constructor for the move and one for the copy:
S(std::string&& str) : data(std::move(str)) {}
S(const std::string& str) : data(str) {}
This is much boilerplate code, especially if you have multiple arguments. Your solution avoids that duplication on the cost of an unnecessary move. (The move operation should be quite cheap, however.)
The competing idiom is to use perfect forwarding:
template <typename T>
S(T&& str) : data(std::forward<T>(str)) {}
The template magic will choose to move or copy depending on the parameter that you pass in. It basically expands to the first version, where both constructor were written by hand. For background information, see Scott Meyer's post on universal references.
From a performance aspect, the perfect forwarding version is superior to your version as it avoids the unnecessary moves. However, one can argue that your version is easier to read and write. The possible performance impact should not matter in most situations, anyway, so it seems to be a matter of style in the end.

How do I initialize boost::any with a reference to an object?

I want to store a reference to an object in a boost::any object. How do I initialize the boost::any object? I tried std::ref(), but boost::any gets initialized with std::reference_wrapper<>. For example, the following
#include <boost/any.hpp>
#include <cxxabi.h>
#include <iostream>
int main(void)
{
int s;
int i = 0;
boost::any x(std::ref(i));
std::cout << abi::__cxa_demangle(x.type().name(), 0, 0, &s) << "\n";
return 0;
}
prints
std::reference_wrapper<int>
I want the boost::any to contain int& instead.

The boost::any class doesn't have an interface allowing something like this: you would need to specify the type of the reference with the constructor. I don't think that you can explicitly specify the type of templated constructor because I don't see any place you could stick it. Even if you can explicitly specify the template parameter, it wouldn't work in C++2003 bcause there is no reference collapsing available and the parameter is declared as taking a T const&: you'd be trying to create a T& const& which won't fly.
I think your best option is to either use std::reference_wrapper<T> if you insist on something looking remotely reference like or just to use T*.
That said, it would be generally possible to have a templatized static factor method of a type similar to boost::any which would be used to explicitly specify the template argument. However, since boost::any is deliberately designed to deal with value types this isn't done. I'm a bit dubious whether it should be done as well: using a pointer is perfectly good alternative. If you really need a reference type you'll probably have to implement it yourself.

The behaviour is correct, expected and appropriate. std::ref is a helper function that creates an object of type std::reference_wrapper<T>, and the reference wrapper is a class with value semantics that holds a reference -- that's exactly the sort of thing you want to put into a container if you want the container to track outside references.
So just go with the solution you have.
If you will, you cannot have a container of direct, naked references, much like you cannot have an array of references. The wrapper is designed precisely to accommodate such needs.

Passing temporaries as LValues

I'd like to use the following idiom, that I think is non-standard. I have functions which return vectors taking advantage of Return Value Optimization:
vector<T> some_func()
{
...
return vector<T>( /* something */ );
}
Then, I would like to use
vector<T>& some_reference;
std::swap(some_reference, some_func());
but some_func doesn't return a LValue. The above code makes sense, and I found this idiom very useful. However, it is non-standard. VC8 only emits a warning at the highest warning level, but I suspect other compilers may reject it.
My question is: Is there some way to achieve the very same thing I want to do (ie. construct a vector, assign to another, and destroy the old one) which is compliant (and does not use the assignment operator, see below) ?
For classes I write, I usually implement assignment as
class T
{
T(T const&);
void swap(T&);
T& operator=(T x) { this->swap(x); return *this; }
};
which takes advantage of copy elision, and solves my problem. For standard types however, I really would like to use swap since I don't want an useless copy of the temporary.
And since I must use VC8 and produce standard C++, I don't want to hear about C++0x and its rvalue references.
EDIT: Finally, I came up with
typedef <typename T>
void assign(T &x, T y)
{
std::swap(x, y);
}
when I use lvalues, since the compiler is free to optimize the call to the copy constructor if y is temporary, and go with std::swap when I have lvalues. All the classes I use are "required" to implement a non-stupid version of std::swap.

Since std::vector is a class type and member functions can be called on rvalues:
some_func().swap(some_reference);

If you don't want useless copies of temporaries, don't return by value.
Use (shared) pointers, pass function arguments by reference to be filled in, insert iterators, ....
Is there a specific reason why you want to return by value?

The only way I know - within the constraints of the standard - to achieve what you want are to apply the expression templates metaprogramming technique: http://en.wikipedia.org/wiki/Expression_templates Which might or not be easy in your case.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js