Why C++ standard library does not pass predicates as && [duplicate] - c++

I was looking at the various signatures for std::find_if on cppreference.com, and I noticed that the flavors that take a predicate function appear to accept it by value:
template< class InputIt, class UnaryPredicate >
InputIt find_if( InputIt first, InputIt last,
UnaryPredicate p );
If I understand them correctly, lambdas with captured variables allocate storage for either references or copies of their data, and so presumably a "pass-by-value" would imply that the copies of captured data are copied for the call.
On the other hand, for function pointers and other directly addressable things, the performance should be better if the function pointer is passed directly, rather than by reference-to-pointer (pointer-to-pointer).
First, is this correct? Is the UnaryPredicate above going to be a by-value parameter?
Second, is my understanding of passing lambdas correct?
Third, is there a reason for passing by value instead of by reference in this situation? And more to the point, is there not some sufficiently ambiguous syntax (hello, universal reference) that would let the compiler do whatever it wants to get the most performance out?

Is the UnaryPredicate above going to be a by-value parameter?
Yes, that's what it says in the function parameter list. It accepts a deduced value type.
Beyond that, lambda expressions are prvalues. Meaning, with c++17's guaranteed copy elision, that p is initialized directly from the lambda expression. No extra copies of the closure or the captured objects are being made when passing it into the function (the function may make more copies internally however, though that's not common).
If the predicate was passed by reference, a temporary object would need to be materialized. So for a lambda expression, nothing is gained by a switch to pass by reference.
If you have other sorts of predicates, which are expansive to copy, then you can pass in std::reference_wrapper to that predicate object, for a cheap "handle" to it. The wrapper's operator() will do the right thing.
The definition is mostly historic, but nowadays it's really a non-issue to do it with pass by value.
To elaborate on why referential semantics would suck, let's try to take it through the years. A simple lvalue reference won't do, since now we don't support binding to an rvalue. A const lvalue reference won't do either, since now we require the predicate to not modify any internal state, and what for?
So up to c++11, we don't really have an alternative. A pass by value would be better than a reference. With the new standard, we may revise our approach. To support rvalues, we may add an rvalue reference overload. But that is an exercise in redundancy, since it doesn't need to do anything different.
By passing a value, the caller has the choice in how to create it, and for prvalues, in c++17, it's practically free. If the caller so desires, they can provide referential semantics explicitly. So nothing is lost, and I think much is gained in terms of simplicity of usage and API design.

There are actually multiple reasons:
you can always turn deduced value arguments into using reference semantics but not vice verse: just pass std::ref(x) instead of x. std::reference_wrapper<T> isn't entirely equivalent to passing a reference but especially for function object it does the Right Thing. That is, passing generic arguments by value is the more general approach.
Pass by reference (T&) doesn't work for temporary or const objects, T const& doesn't work for non-const&, i.e., the only choice would be T&& (forwarding reference) which didn't exist pre-C++11 and the algorithm interfaces didn't change since they were introduced with C++98.
Value parameters can be copy elided unlike any sort of reference parameters, including forwarding references.

Related

Why do stl algorithms take callables by value? [duplicate]

I was looking at the various signatures for std::find_if on cppreference.com, and I noticed that the flavors that take a predicate function appear to accept it by value:
template< class InputIt, class UnaryPredicate >
InputIt find_if( InputIt first, InputIt last,
UnaryPredicate p );
If I understand them correctly, lambdas with captured variables allocate storage for either references or copies of their data, and so presumably a "pass-by-value" would imply that the copies of captured data are copied for the call.
On the other hand, for function pointers and other directly addressable things, the performance should be better if the function pointer is passed directly, rather than by reference-to-pointer (pointer-to-pointer).
First, is this correct? Is the UnaryPredicate above going to be a by-value parameter?
Second, is my understanding of passing lambdas correct?
Third, is there a reason for passing by value instead of by reference in this situation? And more to the point, is there not some sufficiently ambiguous syntax (hello, universal reference) that would let the compiler do whatever it wants to get the most performance out?
Is the UnaryPredicate above going to be a by-value parameter?
Yes, that's what it says in the function parameter list. It accepts a deduced value type.
Beyond that, lambda expressions are prvalues. Meaning, with c++17's guaranteed copy elision, that p is initialized directly from the lambda expression. No extra copies of the closure or the captured objects are being made when passing it into the function (the function may make more copies internally however, though that's not common).
If the predicate was passed by reference, a temporary object would need to be materialized. So for a lambda expression, nothing is gained by a switch to pass by reference.
If you have other sorts of predicates, which are expansive to copy, then you can pass in std::reference_wrapper to that predicate object, for a cheap "handle" to it. The wrapper's operator() will do the right thing.
The definition is mostly historic, but nowadays it's really a non-issue to do it with pass by value.
To elaborate on why referential semantics would suck, let's try to take it through the years. A simple lvalue reference won't do, since now we don't support binding to an rvalue. A const lvalue reference won't do either, since now we require the predicate to not modify any internal state, and what for?
So up to c++11, we don't really have an alternative. A pass by value would be better than a reference. With the new standard, we may revise our approach. To support rvalues, we may add an rvalue reference overload. But that is an exercise in redundancy, since it doesn't need to do anything different.
By passing a value, the caller has the choice in how to create it, and for prvalues, in c++17, it's practically free. If the caller so desires, they can provide referential semantics explicitly. So nothing is lost, and I think much is gained in terms of simplicity of usage and API design.
There are actually multiple reasons:
you can always turn deduced value arguments into using reference semantics but not vice verse: just pass std::ref(x) instead of x. std::reference_wrapper<T> isn't entirely equivalent to passing a reference but especially for function object it does the Right Thing. That is, passing generic arguments by value is the more general approach.
Pass by reference (T&) doesn't work for temporary or const objects, T const& doesn't work for non-const&, i.e., the only choice would be T&& (forwarding reference) which didn't exist pre-C++11 and the algorithm interfaces didn't change since they were introduced with C++98.
Value parameters can be copy elided unlike any sort of reference parameters, including forwarding references.

Why does std::find_if(first, last, p) not take predicate by reference?

I was looking at the various signatures for std::find_if on cppreference.com, and I noticed that the flavors that take a predicate function appear to accept it by value:
template< class InputIt, class UnaryPredicate >
InputIt find_if( InputIt first, InputIt last,
UnaryPredicate p );
If I understand them correctly, lambdas with captured variables allocate storage for either references or copies of their data, and so presumably a "pass-by-value" would imply that the copies of captured data are copied for the call.
On the other hand, for function pointers and other directly addressable things, the performance should be better if the function pointer is passed directly, rather than by reference-to-pointer (pointer-to-pointer).
First, is this correct? Is the UnaryPredicate above going to be a by-value parameter?
Second, is my understanding of passing lambdas correct?
Third, is there a reason for passing by value instead of by reference in this situation? And more to the point, is there not some sufficiently ambiguous syntax (hello, universal reference) that would let the compiler do whatever it wants to get the most performance out?
Is the UnaryPredicate above going to be a by-value parameter?
Yes, that's what it says in the function parameter list. It accepts a deduced value type.
Beyond that, lambda expressions are prvalues. Meaning, with c++17's guaranteed copy elision, that p is initialized directly from the lambda expression. No extra copies of the closure or the captured objects are being made when passing it into the function (the function may make more copies internally however, though that's not common).
If the predicate was passed by reference, a temporary object would need to be materialized. So for a lambda expression, nothing is gained by a switch to pass by reference.
If you have other sorts of predicates, which are expansive to copy, then you can pass in std::reference_wrapper to that predicate object, for a cheap "handle" to it. The wrapper's operator() will do the right thing.
The definition is mostly historic, but nowadays it's really a non-issue to do it with pass by value.
To elaborate on why referential semantics would suck, let's try to take it through the years. A simple lvalue reference won't do, since now we don't support binding to an rvalue. A const lvalue reference won't do either, since now we require the predicate to not modify any internal state, and what for?
So up to c++11, we don't really have an alternative. A pass by value would be better than a reference. With the new standard, we may revise our approach. To support rvalues, we may add an rvalue reference overload. But that is an exercise in redundancy, since it doesn't need to do anything different.
By passing a value, the caller has the choice in how to create it, and for prvalues, in c++17, it's practically free. If the caller so desires, they can provide referential semantics explicitly. So nothing is lost, and I think much is gained in terms of simplicity of usage and API design.
There are actually multiple reasons:
you can always turn deduced value arguments into using reference semantics but not vice verse: just pass std::ref(x) instead of x. std::reference_wrapper<T> isn't entirely equivalent to passing a reference but especially for function object it does the Right Thing. That is, passing generic arguments by value is the more general approach.
Pass by reference (T&) doesn't work for temporary or const objects, T const& doesn't work for non-const&, i.e., the only choice would be T&& (forwarding reference) which didn't exist pre-C++11 and the algorithm interfaces didn't change since they were introduced with C++98.
Value parameters can be copy elided unlike any sort of reference parameters, including forwarding references.

Should templated functions take lambda arguments by value or by rvalue reference?

GCC 4.7 in C++11 mode is letting me define a function taking a lambda two different ways:
// by value
template<class FunctorT>
void foo(FunctorT f) { /* stuff */ }
And:
// by r-value reference
template<class FunctorT>
void foo(FunctorT&& f) { /* stuff */ }
But not:
// by reference
template<class FunctorT>
void foo(FunctorT& f) { /* stuff */ }
I know that I can un-template the functions and just take std::functions instead, but foo is small and inline and I'd like to give the compiler the best opportunity to inline the calls to f it makes inside. Out of the first two, which is preferable for performance if I specifically know I'm passing lambdas, and why isn't it allowed to pass lambdas to the last one?
FunctorT&& is a universal reference and can match anything, not only rvalues. It's the preferred way to pass things in C++11 templates, unless you absolutely need copies, since it allows you to employ perfect forwarding. Access the value through std::forward<FunctorT>(f), which will make f an rvalue again if it was before, or else will leave it as an lvalue. Read more here about the forwarding problem and std::forward and read here for a step-by-step guide on how std::forward really works. This is also an interesting read.
FunctorT& is just a simple lvalue reference, and you can't bind temporaries (the result of a lambda expression) to that.
When you create a lambda function you get a temporary object. You cannot bind a temporary to a non-const l-value references. Actually, you cannot directly create an l-value referencing a lambda function.
When you declare you function template using T&& the argument type for the function will be T const& if you pass a const object to the function, T& if you pass a non-const l-value object to it, and T if you pass it a temporary. That is, when passing a temporary the function declaration will take an r-value reference which can be passed without moving an object. When passing the argument explicitly by value, a temporary object is conceptually copied or moved although this copy or move is typically elided. If you only pass temporary objects to your functions, the first two declarations would do the same thing, although the first declaration could introduce a move or copy.
This is a good question -- the first part: pass-by-value or use forwarding. I think the second part (having FunctorT& as an argument) has been reasonably answered.
My advice is this: use forwarding only when the function object is known, in advance, to modify values in its closure (or capture list). Best example: std::shuffle. It takes a Uniform Random Number Generator (a function object), and each call to the generator modifies its state. The function object is forwarded into the algorithm.
In every other case, you should prefer to pass by value. This does not prevent you from capturing locals by reference and modifying them within your lambda function. That will work just like you think it should. There should be no overhead for copying, as Dietmar says. Inlining will also apply and references may be optimized out.

Passing functor object by value vs by reference (C++)

Compare generic integration functions:
template <class F> double integrate(F integrand);
with
template <class F> double integrate(F& integrand);
or
template <class F> double integrate(const F& integrand);
What are the pros and cons of each? STL uses the first approach (pass by value), does it mean it's the most universal one?
Function objects usually should be small so I don't think that passing them by value will suffer from performance noticably (compare it to the work the function does in its body). If you pass by value, you can also gain from code analysis, because a by value parameter is local to the function and the optimizer may tell when and when not a load from a data member of the functor can be omitted.
If the functor is stateless, passing it as argument implies no cost at all - the padding byte that the functor takes doesn't have to have any particular value (in the Itanium Abi used by GCC at least). When using references, you always have to pass an address.
The last one (const T&) has the drawback that in C++03 that doesn't work for raw functions, because in C++03 the program is ill-formed if you try to apply const to a function type (and is an SFINAE case). More recent implementations instead ignore const when applied on function types.
The second one (T&) has the obvious drawback that you cannot pass temporary functors.
Long story short, I would generally pass them by value, unless I see a clear benefit in concrete cases.
STL uses the first approach (pass by value)
Sure, the standard libraries pass iterators and functors by value. They are assumed (rightly or wrongly) to be cheap to copy, and this means that if you write an iterator or a functor that is expensive to copy, you might have to find a way to optimize that later.
But that is just for the purposes for which the standard libraries use functors - mostly they're predicates, although there are also things like std::transform. If you're integrating a function, that suggests some kind of mathematics libraries, in which case I suppose you might be much more likely to deal with functions that carry a lot of state. You could for example have a class representing nth order polynomials, with n+1 coefficients as non-static data members.
In that case, a const reference might be better. When using such a functor in standard algorithms like transform, you might wrap it in a little class that performs indirection through a pointer, to ensure that it remains cheap to copy.
Taking a non-const reference is potentially annoying to users, since it stops them passing in temporaries.
Given the context, F is expected to be a "callable object" (something like a free function or a class having a operator() defined)
Now, since a free function name cannot be an L-value, the second version is not suitable for that.
The third assumes F::operator() to be const (but may not be the case, if it requires to alter the state of F)
The first operates on a "own copy", but requires F to be copyable.
None of the three is "universal", but the first is most likely working in the most common cases.

Some clarification on rvalue references

First: where are std::move and std::forward defined? I know what they do, but I can't find proof that any standard header is required to include them. In gcc44 sometimes std::move is available, and sometimes its not, so a definitive include directive would be useful.
When implementing move semantics, the source is presumably left in an undefined state. Should this state necessarily be a valid state for the object? Obviously, you need to be able to call the object's destructor, and be able to assign to it by whatever means the class exposes. But should other operations be valid? I suppose what I'm asking is, if your class guarantees certain invariants, should you strive to enforce those invariants when the user has said they don't care about them anymore?
Next: when you don't care about move semantics, are there any limitations that would cause a non-const reference to be preferred over an rvalue reference when dealing with function parameters? void function(T&); over void function(T&&); From a caller's perspective, being able to pass functions temporary values is occasionally useful, so it seems as though one should grant that option whenever it is feasible to do so. And rvalue references are themselves lvalues, so you can't inadvertently call a move-constructor instead of a copy-constructor, or something like that. I don't see a downside, but I'm sure there is one.
Which brings me to my final question. You still can not bind temporaries to non-const references. But you can bind them to non-const rvalue references. And you can then pass along that reference as a non-const reference in another function.
void function1(int& r) { r++; }
void function2(int&& r) { function1(r); }
int main() {
function1(5); //bad
function2(5); //good
}
Besides the fact that it doesn't do anything, is there anything wrong with that code? My gut says of course not, since changing rvalue references is kind of the whole point to their existence. And if the passed value is legitimately const, the compiler will catch it and yell at you. But by all appearances, this is a runaround of a mechanism that was presumably put in place for a reason, so I'd just like confirmation that I'm not doing anything foolish.
First: where are std::move and std::forward defined?
See 20.3 Utility components, <utility>.
When implementing move semantics, the source is presumably left in an undefined state. Should this state necessarily be a valid state for the object?
Obviously, the object should still be destructibly. But further than that, I think it's a good idea to be still assignable. The Standard says for objects that satisfy "MoveConstructible" and "MoveAssignable":
[ Note: rv remains a valid object. Its state is unspecified. — end note ]
This would mean, I think, that the object can still participate in any operation that doesn't state any precondition. This includes CopyConstructible, CopyAssignable, Destructible and other things. Notice that this won't require anything for your own objects from a core language perspective. The requirements only take place once you touch Standard library components that state these requirements.
Next: when you don't care about move semantics, are there any limitations that would cause a non-const reference to be preferred over an rvalue reference when dealing with function parameters?
This, unfortunately, crucially depends on whether the parameter is in a function template and uses a template parameter:
void f(int const&); // takes all lvalues and const rvalues
void f(int&&); // can only accept nonconst rvalues
However for a function template
template<typename T> void f(T const&);
template<typename T> void f(T&&);
You can't say that, because the second template will, after being called with an lvalue, have as parameter of the synthesized declaration the type U& for nonconst lvalues (and be a better match), and U const& for const lvalues (and be ambiguous). To my knowledge, there is no partial ordering rule to disambiguate that second ambiguity. However, this is already known.
-- Edit --
Despite that issue report, I don't think that the two templates are ambiguous. Partial ordering will make the first template more specialized, because after taking away the reference modifiers and the const, we will find that both types are the same, and then notice that the first template had a reference to const. The Standard says (14.9.2.4)
If, for a given type, deduction succeeds in both directions (i.e., the types are identical after the transfor-mations above) and if the type from the argument template is more cv-qualified than the type from the parameter template (as described above) that type is considered to be more specialized than the other.
If for each type being considered a given template is at least as specialized for all types and more specialized for some set of types and the other template is not more specialized for any types or is not at least as specialized for any types, then the given template is more specialized than the other template.
This makes the T const& template the winner of partial ordering (and GCC is indeed correct to choose it).
-- Edit End --
Which brings me to my final question. You still can not bind temporaries to non-const references. But you can bind them to non-const rvalue references.
This is nicely explained in this article. The second call using function2 only takes nonconst rvalues. The rest of the program won't notice if they are modified, because they won't be able to access those rvalues afterwards anymore! And the 5 you pass is not a class type, so a hidden temporary is created and then passed to the int&& rvalue reference. The code calling function2 won't be able to access that hidden object here, so it won't notice any change.
A different situation is if you do this one:
SomeComplexObject o;
function2(move(o));
You have explicitly requested that o is moved, so it will be modified according to its move specification. However moving is a logically non-modifying operation (see the article). This means whether you move or not shouldn't be observable from the calling code:
SomeComplexObject o;
moveit(o); // #1
o = foo;
If you erase the line that moves, behavior will still be the same, because it's overwritten anyway. This however means that code that uses the value of o after it has been moved from is bad, because it breaks this implicit contract between moveit and the calling code. Thus, the Standard makes no specification about the concrete value of a moved from container.
where are std::move and std::forward defined?
std::move and std::forward are declared in <utility>. See the synopsis at the beginning of section 20.3[utility].
When implementing move semantics, the source is presumably left in an undefined state.
It of course depends on how you implement the move-constructor and move-assignment operator. If you want to use your objects in standard containers, however, you have to follow the MoveConstructible and MoveAssignable concepts, which says that the object remains valid, but is left in unspecified state, i.e. you definitely can destroy it.
included by utility
Here is the article I read about rvalues.
I can't help you with rest, sorry.