Confusion about C++ functor/lambda argument-passing in STL algorithms - c++

I like C++11 and its ability to combine STL algorithms to lambdas; it makes the STL much more approachable and useful to everybody. But one thing that I don't understand is what happens inside an STL algorithm (like std::accumulate) regarding object copying or referencing inside the lambda (or wherever functor you give to it).
My three questions are:
Are there any guidelines regarding wherever I should care about pass-by-reference vs. pass-by-value in lambdas/functors?
Does it matter at all if you declare a lambda inside an algorithm that takes its arguments by reference ([](Type &a, Type &b){}), and will it be more optimal than a regular variant; or is it just syntax sugar, the compiler will optimize it anyway, and I could simply omit the ampersands?
Does the C++ Standard has any provision about this?
As for question #2, a quick experiment in Godbolt's GCC page (using compilation flags -stc=c++11 -Os) seems to suggest the latter, as the generated assembly from the code below is identical wherever I use [](T i1, T i2) or [](T &i1, T &i2); I don't know if those results could be generalized to more complex types/objects, however.
Example #1:
#include<array>
#include<numeric>
template <typename T>
T vecSum(std::array<T, 4> &a){
return std::accumulate(a.begin(), a.end(), T(0),
[](T i1, T i2) {
return std::abs(i1) + std::abs(i2);
}
);
}
void results() {
std::array<int, 4> a = {1,-2, 3,-4};
std::array<int, 4> b = {1,-2,-3, 4};
volatile int c = vecSum(a) + vecSum(b);
}
Example #2:
#include<string>
#include<array>
#include<numeric>
struct FatObject {
std::array<int, 1024*1024> garbage;
std::string string;
FatObject(const std::string &str) : string(str) {
std::fill(garbage.begin(),garbage.end(),0xCAFEDEAD);
}
std::string operator+(const FatObject &rhs) const {
return string + rhs.string;
}
};
template <typename T>
T vecSum(std::array<T, 4> &a){
return std::accumulate(a.begin(),a.end(),T(0),
[](T i1, T i2) {
return i1 + i2;
}
);
}
void results() {
std::array<FatObject, 4> a = {
FatObject("The "),
FatObject("quick "),
FatObject("brown "),
FatObject("fox")
};
std::array<FatObject, 4> b = {
FatObject("jumps "),
FatObject("over "),
FatObject("the "),
FatObject("dog ")
};
volatile std::string c = vecSum(a) + vecSum(b);
}

Your question is quite broad, but here is my concise answer.
1) In general, the guidelines for passing by-value vs by-reference in lambdas or functors are the same as they are for any regular function or method (a lambda is a functor created on the fly for you, which is a an object with an operator()(T)). The choice is mostly specific to your case, for example if the lambda/functor needs read-only access to its arguments you tipycally would pass a const reference.
2) Inside an algorithm that accepts a callable object as an argument (and as a template parameter) the compiler is bound to respect the rules of the language. Therefore parameters will be passed by value or by reference internally as per the signature of the lambda/functor.
Keep in mind that copy elision may enter into play, but that is a separate issue, not directly related to the fact that you are calling a lambda inside an standard library algorithm.
The example with int is too simple. I suggest you to experiment with actual objects.
3) C++ Standard provides precise definitions for the conditions where copy elision occurs, as well as the requirements on the signature of a lambda/functor parameter for a particular Standard Library algorithm.
However, it will not be easy in general to know if the internal implementation of a particular algorithm is going to call the lambda/functor in a way that meets copy elision conditions.
Note that the requirements on signature have some degree of flexibility, for example in the std::accumulate documentation we have
The signature of the function should be equivalent to the following:
Ret fun(const Type1 &a, const Type2 &b); The signature does not need
to have const &.
so you can choose to pass by value or by reference as you see fit.

In lambdas/functions are the same rules as in all C++.
You should use non-const reference if the intent of the function is to modify the object for the caller. The function should use const& if it is just using the object without changing it. And it should pass by value if it is going to copy/move the object into its internal storage.
If you pass a small object like int it makes no difference if you pass by value or by reference.
When you start to pass a big object it makes a big impact on performance.

Related

lambda by-value capture in class member function shows strange behavior [duplicate]

I have a class which accumulates information about a set of objects, and can act as either a functor or an output iterator. This allows me to do things like:
std::vector<Foo> v;
Foo const x = std::for_each(v.begin(), v.end(), Joiner<Foo>());
and
Foo const x = std::copy(v.begin(), v.end(), Joiner<Foo>());
Now, in theory, the compiler should be able to use the copy elision and return-value optimizations so that only a single Joiner object needs to be created. In practice, however, the function makes a copy on which to operate and then copies that back to the result, even in fully-optimized builds.
If I create the functor as an lvalue, the compiler creates two extra copies instead of one:
Joiner<Foo> joiner;
Foo const x = std::copy(v.begin(), v.end(), joiner);
If I awkwardly force the template type to a reference it passes in a reference, but then makes a copy of it anyway and returns a dangling reference to the (now-destroyed) temporary copy:
x = std::copy<Container::const_iterator, Joiner<Foo>&>(...));
I can make the copies cheap by using a reference to the state rather than the state itself in the functor in the style of std::inserter, leading to something like this:
Foo output;
std::copy(v.begin(), v.end(), Joiner<Foo>(output));
But this makes it impossible to use the "functional" style of immutable objects, and just generally isn't as nice.
Is there some way to encourage the compiler to elide the temporary copies, or make it pass a reference all the way through and return that same reference?
You have stumbled upon an often complained about behavior with <algorithm>. There are no restrictions on what they can do with the functor, so the answer to your question is no: there is no way to encourage the compiler to elide the copies. It's not (always) the compiler, it's the library implementation. They just like to pass around functors by value (think of std::sort doing a qsort, passing in the functor by value to recursive calls, etc).
You have also stumbled upon the exact solution everyone uses: have a functor keep a reference to the state, so all copies refer to the same state when this is desired.
I found this ironic:
But this makes it impossible to use the "functional" style of immutable objects, and just generally isn't as nice.
...since this whole question is predicated on you having a complicated stateful functor, where creating copies is problematic. If you were using "functional" style immutable objects this would be a non-issue - the extra copies wouldn't be a problem, would they?
If you have a recent compiler (At least Visual Studio 2008 SP1 or GCC 4.4 I think) you can use std::ref/std::cref
#include <string>
#include <vector>
#include <functional> // for std::cref
#include <algorithm>
#include <iostream>
template <typename T>
class SuperHeavyFunctor
{
std::vector<char> v500mo;
//ban copy
SuperHeavyFunctor(const SuperHeavyFunctor&);
SuperHeavyFunctor& operator=(const SuperHeavyFunctor&);
public:
SuperHeavyFunctor():v500mo(500*1024*1024){}
void operator()(const T& t) const { std::cout << t << std::endl; }
};
int main()
{
std::vector<std::string> v; v.push_back("Hello"); v.push_back("world");
std::for_each(v.begin(), v.end(), std::cref(SuperHeavyFunctor<std::string>()));
return 0;
}
Edit : Actually, the MSVC10's implementation of reference_wrapper don't seem to known how to deduce the return type of function object operator(). I had to derive SuperHeavyFunctor from std::unary_function<T, void> to make it work.
Just a quick note, for_each, accumulate, transform (2nd form), provide no order guarantee when traversing the provided range.
This makes sense for implementers to provide mulit-threaded/concurrent versions of these functions.
Hence it is reasonable that the algorithm be able to provide an equivalent instance (a new copy) of the functor passed in.
Be wary when making stateful functors.
RVO is just that -- return value optimization. Most compilers, today, have this turned-on by default. However, argument passing is not returning a value. You possibly cannot expect one optimization to fit in everywhere.
Refer to conditions for copy elision is defined clearly in 12.8, para 15, item 3.
when a temporary class object that has
not been bound to a reference (12.2)
would be copied to a class object with
the same cv-unqualified type, the copy
operation can be omitted by
constructing the temporary object
directly into the target of the
omitted copy
[emphasis mine]
The LHS Foo is const qualified, the temporary is not. IMHO, this precludes the possibility of copy-elision.
For a solution that will work with pre-c++11 code, you may consider using boost::function along with boost::ref(as boost::reference_wrapper alone doesn't has an overloaded operator(), unlike std::reference_wrapper which indeed does). From this page http://www.boost.org/doc/libs/1_55_0/doc/html/function/tutorial.html#idp95780904, you can double wrap your functor inside a boost::ref then a boost::function object. I tried that solution and it worked flawlessly.
For c++11, you can just go with std::ref and it'll do the job.

structured bindings with std::minmax and rvalues

I ran into a rather subtle bug when using std::minmax with structured bindings. It appears that passed rvalues will not always be copied as one might expect. Originally I was using a T operator[]() const on a custom container, but it seems to be the same with a literal integer.
#include <algorithm>
#include <cstdio>
#include <tuple>
int main()
{
auto [amin, amax] = std::minmax(3, 6);
printf("%d,%d\n", amin, amax); // undefined,undefined
int bmin, bmax;
std::tie(bmin, bmax) = std::minmax(3, 6);
printf("%d,%d\n", bmin, bmax); // 3,6
}
Using GCC 8.1.1 with -O1 -Wuninitialized will result in 0,0 being printed as first line and:
warning: ‘<anonymous>’ is used uninitialized in this function [-Wuninitialized]
Clang 6.0.1 at -O2 will also give a wrong first result with no warning.
At -O0 GCC gives a correct result and no warning. For clang the result appears to be correct at -O1 or -O0.
Should not the first and second line be equivalent in the sense that the rvalue is still valid for being copied?
Also, why does this depend on the optimization level? Particularly I was surprised that GCC issues no warning.
What's important to note in auto [amin, amax] is that the auto, auto& and so forth are applied on the made up object e that is initialized with the return value of std::minmax, which is a pair. It's essentially this:
auto e = std::minmax(3, 6);
auto&& amin = std::get<0>(e);
auto&& amax = std::get<1>(e);
The actual types of amin and amax are references that refer to whatever std::get<0> and std::get<1> return for that pair object. And they themselves return references to objects long gone!
When you use std::tie, you are doing assignment to existing objects (passed by reference). The rvalues don't need to live longer than the assignment expressions in which they come into being.
As a work around, you can use something like this (not production quality) function:
template<typename T1, typename T2>
auto as_value(std::pair<T1, T2> in) {
using U1 = std::decay_t<T1>;
using U2 = std::decay_t<T2>;
return std::pair<U1, U2>(in);
}
It ensures the pair holds value types. When used like this:
auto [amin, amax] = as_value(std::minmax(3, 6));
We now get a copy made, and the structured bindings refer to those copies.
There are two fundamental issues going on here:
min, max, and minmax for historic reasons return references. So if you pass in a temporary, you'd better take the result by value or immediately use it, otherwise you get a dangling reference. If minmax gave you a pair<int, int> here instead of a pair<int const&, int const&>, you wouldn't have any problems.
auto decays top-level cv-qualifiers and strips references, but it doesn't remove all the way down. Here, you're deducing that pair<int const&, int const&>, but if we had deduced pair<int, int>, we would again not have any problems.
(1) is a much easier problem to solve than (2): write your own functions to take everything by value:
template <typename T>
std::pair<T, T> minmax(T a, T b) {
return (b < a) ? std::pair(b, a) : std::pair(a, b);
}
auto [amin, amax] = minmax(3, 6); // no problems
The nice thing about taking everything by value is that you never have to worry about hidden dangling references, because there aren't any. And the vast majority of uses of these functions are using integral types anyway, so there's no benefit to references.
And when you do need references, for when you're comparing expensive-to-copy objects... well, it's easier to take a function that takes values and force it to use references than it is to take a function that uses references and try to fix it:
auto [lo, hi] = minmax(std::ref(big1), std::ref(big2));
Additionally, it's very visible here at the call site that we're using references, so it would be much more obvious if we messed up.
While the above works for lots of types due to reference_wrapper<T>'s implicit conversion to T&, it won't work for those types that have non-member, non-friend, operator templates (like std::string). So you'd additionally need to write a specialization for reference wrappers, unfortunately.

Can the use of C++11's 'auto' improve performance?

I can see why the auto type in C++11 improves correctness and maintainability. I've read that it can also improve performance (Almost Always Auto by Herb Sutter), but I miss a good explanation.
How can auto improve performance?
Can anyone give an example?
auto can aid performance by avoiding silent implicit conversions. An example I find compelling is the following.
std::map<Key, Val> m;
// ...
for (std::pair<Key, Val> const& item : m) {
// do stuff
}
See the bug? Here we are, thinking we're elegantly taking every item in the map by const reference and using the new range-for expression to make our intent clear, but actually we're copying every element. This is because std::map<Key, Val>::value_type is std::pair<const Key, Val>, not std::pair<Key, Val>. Thus, when we (implicitly) have:
std::pair<Key, Val> const& item = *iter;
Instead of taking a reference to an existing object and leaving it at that, we have to do a type conversion. You are allowed to take a const reference to an object (or temporary) of a different type as long as there is an implicit conversion available, e.g.:
int const& i = 2.0; // perfectly OK
The type conversion is an allowed implicit conversion for the same reason you can convert a const Key to a Key, but we have to construct a temporary of the new type in order to allow for that. Thus, effectively our loop does:
std::pair<Key, Val> __tmp = *iter; // construct a temporary of the correct type
std::pair<Key, Val> const& item = __tmp; // then, take a reference to it
(Of course, there isn't actually a __tmp object, it's just there for illustration, in reality the unnamed temporary is just bound to item for its lifetime).
Just changing to:
for (auto const& item : m) {
// do stuff
}
just saved us a ton of copies - now the referenced type matches the initializer type, so no temporary or conversion is necessary, we can just do a direct reference.
Because auto deduces the type of the initializing expression, there is no type conversion involved. Combined with templated algorithms, this means that you can get a more direct computation than if you were to make up a type yourself – especially when you are dealing with expressions whose type you cannot name!
A typical example comes from (ab)using std::function:
std::function<bool(T, T)> cmp1 = std::bind(f, _2, 10, _1); // bad
auto cmp2 = std::bind(f, _2, 10, _1); // good
auto cmp3 = [](T a, T b){ return f(b, 10, a); }; // also good
std::stable_partition(begin(x), end(x), cmp?);
With cmp2 and cmp3, the entire algorithm can inline the comparison call, whereas if you construct a std::function object, not only can the call not be inlined, but you also have to go through the polymorphic lookup in the type-erased interior of the function wrapper.
Another variant on this theme is that you can say:
auto && f = MakeAThing();
This is always a reference, bound to the value of the function call expression, and never constructs any additional objects. If you didn't know the returned value's type, you might be forced to construct a new object (perhaps as a temporary) via something like T && f = MakeAThing(). (Moreover, auto && even works when the return type is not movable and the return value is a prvalue.)
There are two categories.
auto can avoid type erasure. There are unnamable types (like lambdas), and almost unnamable types (like the result of std::bind or other expression-template like things).
Without auto, you end up having to type erase the data down to something like std::function. Type erasure has costs.
std::function<void()> task1 = []{std::cout << "hello";};
auto task2 = []{std::cout << " world\n";};
task1 has type erasure overhead -- a possible heap allocation, difficulty inlining it, and virtual function table invocation overhead. task2 has none. Lambdas need auto or other forms of type deduction to store without type erasure; other types can be so complex that they only need it in practice.
Second, you can get types wrong. In some cases, the wrong type will work seemingly perfectly, but will cause a copy.
Foo const& f = expression();
will compile if expression() returns Bar const& or Bar or even Bar&, where Foo can be constructed from Bar. A temporary Foo will be created, then bound to f, and its lifetime will be extended until f goes away.
The programmer may have meant Bar const& f and not intended to make a copy there, but a copy is made regardless.
The most common example is the type of *std::map<A,B>::const_iterator, which is std::pair<A const, B> const& not std::pair<A,B> const&, but the error is a category of errors that silently cost performance. You can construct a std::pair<A, B> from a std::pair<const A, B>. (The key on a map is const, because editing it is a bad idea)
Both #Barry and #KerrekSB first illustrated these two principles in their answers. This is simply an attempt to highlight the two issues in one answer, with wording that aims at the problem rather than being example-centric.
The existing three answers give examples where using auto helps “makes it less likely to unintentionally pessimize” effectively making it "improve performance".
There is a flip side to the the coin. Using auto with objects that have operators that don't return the basic object can result in incorrect (still compilable and runable) code. For example, this question asks how using auto gave different (incorrect) results using the Eigen library, i.e. the following lines
const auto resAuto = Ha + Vector3(0.,0.,j * 2.567);
const Vector3 resVector3 = Ha + Vector3(0.,0.,j * 2.567);
std::cout << "resAuto = " << resAuto <<std::endl;
std::cout << "resVector3 = " << resVector3 <<std::endl;
resulted in different output. Admittedly, this is mostly due to Eigens lazy evaluation, but that code is/should be transparent to the (library) user.
While performance hasn't been greatly affected here, using auto to avoid unintentional pessimization might be classified as premature optimization, or at least wrong ;).

C++11 std::set lambda comparison function

I want to create a std::set with a custom comparison function. I could define it as a class with operator(), but I wanted to enjoy the ability to define a lambda where it is used, so I decided to define the lambda function in the initialization list of the constructor of the class which has the std::set as a member. But I can't get the type of the lambda. Before I proceed, here's an example:
class Foo
{
private:
std::set<int, /*???*/> numbers;
public:
Foo () : numbers ([](int x, int y)
{
return x < y;
})
{
}
};
I found two solutions after searching: one, using std::function. Just have the set comparison function type be std::function<bool (int, int)> and pass the lambda exactly like I did. The second solution is to write a make_set function, like std::make_pair.
SOLUTION 1:
class Foo
{
private:
std::set<int, std::function<bool (int, int)> numbers;
public:
Foo () : numbers ([](int x, int y)
{
return x < y;
})
{
}
};
SOLUTION 2:
template <class Key, class Compare>
std::set<Key, Compare> make_set (Compare compare)
{
return std::set<Key, Compare> (compare);
}
The question is, do I have a good reason to prefer one solution over the other? I prefer the first one because it makes use of standard features (make_set is not a standard function), but I wonder: does using std::function make the code (potentially) slower? I mean, does it lower the chance the compiler inlines the comparison function, or it should be smart enough to behave exactly the same like it would it was a lambda function type and not std::function (I know, in this case it can't be a lambda type, but you know, I'm asking in general) ?
(I use GCC, but I'd like to know what popular compilers do in general)
SUMMARY, AFTER I GOT LOTS OF GREAT ANSWERS:
If speed is critical, the best solution is to use an class with operator() aka functor. It's easiest for the compiler to optimize and avoid any indirections.
For easy maintenance and a better general-purpose solution, using C++11 features, use std::function. It's still fast (just a little bit slower than the functor, but it may be negligible) and you can use any function - std::function, lambda, any callable object.
There's also an option to use a function pointer, but if there's no speed issue I think std::function is better (if you use C++11).
There's an option to define the lambda function somewhere else, but then you gain nothing from the comparison function being a lambda expression, since you could as well make it a class with operator() and the location of definition wouldn't be the set construction anyway.
There are more ideas, such as using delegation. If you want a more thorough explanation of all solutions, read the answers :)
It's unlikely that the compiler will be able to inline a std::function call, whereas any compiler that supports lambdas would almost certainly inline the functor version, including if that functor is a lambda not hidden by a std::function.
You could use decltype to get the lambda's comparator type:
#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>
int main()
{
auto comp = [](int x, int y){ return x < y; };
auto set = std::set<int,decltype(comp)>( comp );
set.insert(1);
set.insert(10);
set.insert(1); // Dupe!
set.insert(2);
std::copy( set.begin(), set.end(), std::ostream_iterator<int>(std::cout, "\n") );
}
Which prints:
1
2
10
See it run live on Coliru.
Yes, a std::function introduces nearly unavoidable indirection to your set. While the compiler can always, in theory, figure out that all use of your set's std::function involves calling it on a lambda that is always the exact same lambda, that is both hard and extremely fragile.
Fragile, because before the compiler can prove to itself that all calls to that std::function are actually calls to your lambda, it must prove that no access to your std::set ever sets the std::function to anything but your lambda. Which means it has to track down all possible routes to reach your std::set in all compilation units and prove none of them do it.
This might be possible in some cases, but relatively innocuous changes could break it even if your compiler managed to prove it.
On the other hand, a functor with a stateless operator() has easy to prove behavior, and optimizations involving that are everyday things.
So yes, in practice I'd suspect std::function could be slower. On the other hand, std::function solution is easier to maintain than the make_set one, and exchanging programmer time for program performance is pretty fungible.
make_set has the serious disadvantage that any such set's type must be inferred from the call to make_set. Often a set stores persistent state, and not something you create on the stack then let fall out of scope.
If you created a static or global stateless lambda auto MyComp = [](A const&, A const&)->bool { ... }, you can use the std::set<A, decltype(MyComp)> syntax to create a set that can persist, yet is easy for the compiler to optimize (because all instances of decltype(MyComp) are stateless functors) and inline. I point this out, because you are sticking the set in a struct. (Or does your compiler support
struct Foo {
auto mySet = make_set<int>([](int l, int r){ return l<r; });
};
which I would find surprising!)
Finally, if you are worried about performance, consider that std::unordered_set is much faster (at the cost of being unable to iterate over the contents in order, and having to write/find a good hash), and that a sorted std::vector is better if you have a 2-phase "insert everything" then "query contents repeatedly". Simply stuff it into the vector first, then sort unique erase, then use the free equal_range algorithm.
A stateless lambda (i.e. one with no captures) can decay to a function pointer, so your type could be:
std::set<int, bool (*)(int, int)> numbers;
Otherwise I'd go for the make_set solution. If you won't use a one-line creation function because it's non-standard you're not going to get much code written!
From my experience playing around with the profiler, the best compromise between performance and beauty is to use a custom delegate implementation, such as:
https://codereview.stackexchange.com/questions/14730/impossibly-fast-delegate-in-c11
As the std::function is usually a bit too heavy. I can't comment on your specific circumstances, as I don't know them, though.
If you're determined to have the set as a class member, initializing its comparator at constructor time, then at least one level of indirection is unavoidable. Consider that as far as the compiler knows, you could add another constructor:
Foo () : numbers ([](int x, int y)
{
return x < y;
})
{
}
Foo (char) : numbers ([](int x, int y)
{
return x > y;
})
{
}
Once the you have an object of type Foo, the type of the set doesn't carry information on which constructor initialized its comparator, so to call the correct lambda requires an indirection to the run-time selected lambda operator().
Since you're using captureless lambdas, you could use the function pointer type bool (*)(int, int) as your comparator type, as captureless lambdas have the appropriate conversion function. This would of course involve an indirection through the function pointer.
The difference highly depends on your compiler's optimizations. If it optimizes lambda in a std::function those are equivalent, if not you introduce an indirection in the former that you won't have in the latter.

Prevent unnecessary copies of C++ functor objects

I have a class which accumulates information about a set of objects, and can act as either a functor or an output iterator. This allows me to do things like:
std::vector<Foo> v;
Foo const x = std::for_each(v.begin(), v.end(), Joiner<Foo>());
and
Foo const x = std::copy(v.begin(), v.end(), Joiner<Foo>());
Now, in theory, the compiler should be able to use the copy elision and return-value optimizations so that only a single Joiner object needs to be created. In practice, however, the function makes a copy on which to operate and then copies that back to the result, even in fully-optimized builds.
If I create the functor as an lvalue, the compiler creates two extra copies instead of one:
Joiner<Foo> joiner;
Foo const x = std::copy(v.begin(), v.end(), joiner);
If I awkwardly force the template type to a reference it passes in a reference, but then makes a copy of it anyway and returns a dangling reference to the (now-destroyed) temporary copy:
x = std::copy<Container::const_iterator, Joiner<Foo>&>(...));
I can make the copies cheap by using a reference to the state rather than the state itself in the functor in the style of std::inserter, leading to something like this:
Foo output;
std::copy(v.begin(), v.end(), Joiner<Foo>(output));
But this makes it impossible to use the "functional" style of immutable objects, and just generally isn't as nice.
Is there some way to encourage the compiler to elide the temporary copies, or make it pass a reference all the way through and return that same reference?
You have stumbled upon an often complained about behavior with <algorithm>. There are no restrictions on what they can do with the functor, so the answer to your question is no: there is no way to encourage the compiler to elide the copies. It's not (always) the compiler, it's the library implementation. They just like to pass around functors by value (think of std::sort doing a qsort, passing in the functor by value to recursive calls, etc).
You have also stumbled upon the exact solution everyone uses: have a functor keep a reference to the state, so all copies refer to the same state when this is desired.
I found this ironic:
But this makes it impossible to use the "functional" style of immutable objects, and just generally isn't as nice.
...since this whole question is predicated on you having a complicated stateful functor, where creating copies is problematic. If you were using "functional" style immutable objects this would be a non-issue - the extra copies wouldn't be a problem, would they?
If you have a recent compiler (At least Visual Studio 2008 SP1 or GCC 4.4 I think) you can use std::ref/std::cref
#include <string>
#include <vector>
#include <functional> // for std::cref
#include <algorithm>
#include <iostream>
template <typename T>
class SuperHeavyFunctor
{
std::vector<char> v500mo;
//ban copy
SuperHeavyFunctor(const SuperHeavyFunctor&);
SuperHeavyFunctor& operator=(const SuperHeavyFunctor&);
public:
SuperHeavyFunctor():v500mo(500*1024*1024){}
void operator()(const T& t) const { std::cout << t << std::endl; }
};
int main()
{
std::vector<std::string> v; v.push_back("Hello"); v.push_back("world");
std::for_each(v.begin(), v.end(), std::cref(SuperHeavyFunctor<std::string>()));
return 0;
}
Edit : Actually, the MSVC10's implementation of reference_wrapper don't seem to known how to deduce the return type of function object operator(). I had to derive SuperHeavyFunctor from std::unary_function<T, void> to make it work.
Just a quick note, for_each, accumulate, transform (2nd form), provide no order guarantee when traversing the provided range.
This makes sense for implementers to provide mulit-threaded/concurrent versions of these functions.
Hence it is reasonable that the algorithm be able to provide an equivalent instance (a new copy) of the functor passed in.
Be wary when making stateful functors.
RVO is just that -- return value optimization. Most compilers, today, have this turned-on by default. However, argument passing is not returning a value. You possibly cannot expect one optimization to fit in everywhere.
Refer to conditions for copy elision is defined clearly in 12.8, para 15, item 3.
when a temporary class object that has
not been bound to a reference (12.2)
would be copied to a class object with
the same cv-unqualified type, the copy
operation can be omitted by
constructing the temporary object
directly into the target of the
omitted copy
[emphasis mine]
The LHS Foo is const qualified, the temporary is not. IMHO, this precludes the possibility of copy-elision.
For a solution that will work with pre-c++11 code, you may consider using boost::function along with boost::ref(as boost::reference_wrapper alone doesn't has an overloaded operator(), unlike std::reference_wrapper which indeed does). From this page http://www.boost.org/doc/libs/1_55_0/doc/html/function/tutorial.html#idp95780904, you can double wrap your functor inside a boost::ref then a boost::function object. I tried that solution and it worked flawlessly.
For c++11, you can just go with std::ref and it'll do the job.