Why do stl algorithms take callables by value? [duplicate] - c++

I was looking at the various signatures for std::find_if on cppreference.com, and I noticed that the flavors that take a predicate function appear to accept it by value:
template< class InputIt, class UnaryPredicate >
InputIt find_if( InputIt first, InputIt last,
UnaryPredicate p );
If I understand them correctly, lambdas with captured variables allocate storage for either references or copies of their data, and so presumably a "pass-by-value" would imply that the copies of captured data are copied for the call.
On the other hand, for function pointers and other directly addressable things, the performance should be better if the function pointer is passed directly, rather than by reference-to-pointer (pointer-to-pointer).
First, is this correct? Is the UnaryPredicate above going to be a by-value parameter?
Second, is my understanding of passing lambdas correct?
Third, is there a reason for passing by value instead of by reference in this situation? And more to the point, is there not some sufficiently ambiguous syntax (hello, universal reference) that would let the compiler do whatever it wants to get the most performance out?

Is the UnaryPredicate above going to be a by-value parameter?
Yes, that's what it says in the function parameter list. It accepts a deduced value type.
Beyond that, lambda expressions are prvalues. Meaning, with c++17's guaranteed copy elision, that p is initialized directly from the lambda expression. No extra copies of the closure or the captured objects are being made when passing it into the function (the function may make more copies internally however, though that's not common).
If the predicate was passed by reference, a temporary object would need to be materialized. So for a lambda expression, nothing is gained by a switch to pass by reference.
If you have other sorts of predicates, which are expansive to copy, then you can pass in std::reference_wrapper to that predicate object, for a cheap "handle" to it. The wrapper's operator() will do the right thing.
The definition is mostly historic, but nowadays it's really a non-issue to do it with pass by value.
To elaborate on why referential semantics would suck, let's try to take it through the years. A simple lvalue reference won't do, since now we don't support binding to an rvalue. A const lvalue reference won't do either, since now we require the predicate to not modify any internal state, and what for?
So up to c++11, we don't really have an alternative. A pass by value would be better than a reference. With the new standard, we may revise our approach. To support rvalues, we may add an rvalue reference overload. But that is an exercise in redundancy, since it doesn't need to do anything different.
By passing a value, the caller has the choice in how to create it, and for prvalues, in c++17, it's practically free. If the caller so desires, they can provide referential semantics explicitly. So nothing is lost, and I think much is gained in terms of simplicity of usage and API design.

There are actually multiple reasons:
you can always turn deduced value arguments into using reference semantics but not vice verse: just pass std::ref(x) instead of x. std::reference_wrapper<T> isn't entirely equivalent to passing a reference but especially for function object it does the Right Thing. That is, passing generic arguments by value is the more general approach.
Pass by reference (T&) doesn't work for temporary or const objects, T const& doesn't work for non-const&, i.e., the only choice would be T&& (forwarding reference) which didn't exist pre-C++11 and the algorithm interfaces didn't change since they were introduced with C++98.
Value parameters can be copy elided unlike any sort of reference parameters, including forwarding references.

Related

Why C++ standard library does not pass predicates as && [duplicate]

I was looking at the various signatures for std::find_if on cppreference.com, and I noticed that the flavors that take a predicate function appear to accept it by value:
template< class InputIt, class UnaryPredicate >
InputIt find_if( InputIt first, InputIt last,
UnaryPredicate p );
If I understand them correctly, lambdas with captured variables allocate storage for either references or copies of their data, and so presumably a "pass-by-value" would imply that the copies of captured data are copied for the call.
On the other hand, for function pointers and other directly addressable things, the performance should be better if the function pointer is passed directly, rather than by reference-to-pointer (pointer-to-pointer).
First, is this correct? Is the UnaryPredicate above going to be a by-value parameter?
Second, is my understanding of passing lambdas correct?
Third, is there a reason for passing by value instead of by reference in this situation? And more to the point, is there not some sufficiently ambiguous syntax (hello, universal reference) that would let the compiler do whatever it wants to get the most performance out?
Is the UnaryPredicate above going to be a by-value parameter?
Yes, that's what it says in the function parameter list. It accepts a deduced value type.
Beyond that, lambda expressions are prvalues. Meaning, with c++17's guaranteed copy elision, that p is initialized directly from the lambda expression. No extra copies of the closure or the captured objects are being made when passing it into the function (the function may make more copies internally however, though that's not common).
If the predicate was passed by reference, a temporary object would need to be materialized. So for a lambda expression, nothing is gained by a switch to pass by reference.
If you have other sorts of predicates, which are expansive to copy, then you can pass in std::reference_wrapper to that predicate object, for a cheap "handle" to it. The wrapper's operator() will do the right thing.
The definition is mostly historic, but nowadays it's really a non-issue to do it with pass by value.
To elaborate on why referential semantics would suck, let's try to take it through the years. A simple lvalue reference won't do, since now we don't support binding to an rvalue. A const lvalue reference won't do either, since now we require the predicate to not modify any internal state, and what for?
So up to c++11, we don't really have an alternative. A pass by value would be better than a reference. With the new standard, we may revise our approach. To support rvalues, we may add an rvalue reference overload. But that is an exercise in redundancy, since it doesn't need to do anything different.
By passing a value, the caller has the choice in how to create it, and for prvalues, in c++17, it's practically free. If the caller so desires, they can provide referential semantics explicitly. So nothing is lost, and I think much is gained in terms of simplicity of usage and API design.
There are actually multiple reasons:
you can always turn deduced value arguments into using reference semantics but not vice verse: just pass std::ref(x) instead of x. std::reference_wrapper<T> isn't entirely equivalent to passing a reference but especially for function object it does the Right Thing. That is, passing generic arguments by value is the more general approach.
Pass by reference (T&) doesn't work for temporary or const objects, T const& doesn't work for non-const&, i.e., the only choice would be T&& (forwarding reference) which didn't exist pre-C++11 and the algorithm interfaces didn't change since they were introduced with C++98.
Value parameters can be copy elided unlike any sort of reference parameters, including forwarding references.

Why does std::find_if(first, last, p) not take predicate by reference?

I was looking at the various signatures for std::find_if on cppreference.com, and I noticed that the flavors that take a predicate function appear to accept it by value:
template< class InputIt, class UnaryPredicate >
InputIt find_if( InputIt first, InputIt last,
UnaryPredicate p );
If I understand them correctly, lambdas with captured variables allocate storage for either references or copies of their data, and so presumably a "pass-by-value" would imply that the copies of captured data are copied for the call.
On the other hand, for function pointers and other directly addressable things, the performance should be better if the function pointer is passed directly, rather than by reference-to-pointer (pointer-to-pointer).
First, is this correct? Is the UnaryPredicate above going to be a by-value parameter?
Second, is my understanding of passing lambdas correct?
Third, is there a reason for passing by value instead of by reference in this situation? And more to the point, is there not some sufficiently ambiguous syntax (hello, universal reference) that would let the compiler do whatever it wants to get the most performance out?
Is the UnaryPredicate above going to be a by-value parameter?
Yes, that's what it says in the function parameter list. It accepts a deduced value type.
Beyond that, lambda expressions are prvalues. Meaning, with c++17's guaranteed copy elision, that p is initialized directly from the lambda expression. No extra copies of the closure or the captured objects are being made when passing it into the function (the function may make more copies internally however, though that's not common).
If the predicate was passed by reference, a temporary object would need to be materialized. So for a lambda expression, nothing is gained by a switch to pass by reference.
If you have other sorts of predicates, which are expansive to copy, then you can pass in std::reference_wrapper to that predicate object, for a cheap "handle" to it. The wrapper's operator() will do the right thing.
The definition is mostly historic, but nowadays it's really a non-issue to do it with pass by value.
To elaborate on why referential semantics would suck, let's try to take it through the years. A simple lvalue reference won't do, since now we don't support binding to an rvalue. A const lvalue reference won't do either, since now we require the predicate to not modify any internal state, and what for?
So up to c++11, we don't really have an alternative. A pass by value would be better than a reference. With the new standard, we may revise our approach. To support rvalues, we may add an rvalue reference overload. But that is an exercise in redundancy, since it doesn't need to do anything different.
By passing a value, the caller has the choice in how to create it, and for prvalues, in c++17, it's practically free. If the caller so desires, they can provide referential semantics explicitly. So nothing is lost, and I think much is gained in terms of simplicity of usage and API design.
There are actually multiple reasons:
you can always turn deduced value arguments into using reference semantics but not vice verse: just pass std::ref(x) instead of x. std::reference_wrapper<T> isn't entirely equivalent to passing a reference but especially for function object it does the Right Thing. That is, passing generic arguments by value is the more general approach.
Pass by reference (T&) doesn't work for temporary or const objects, T const& doesn't work for non-const&, i.e., the only choice would be T&& (forwarding reference) which didn't exist pre-C++11 and the algorithm interfaces didn't change since they were introduced with C++98.
Value parameters can be copy elided unlike any sort of reference parameters, including forwarding references.

Passing functor object by value vs by reference (C++)

Compare generic integration functions:
template <class F> double integrate(F integrand);
with
template <class F> double integrate(F& integrand);
or
template <class F> double integrate(const F& integrand);
What are the pros and cons of each? STL uses the first approach (pass by value), does it mean it's the most universal one?
Function objects usually should be small so I don't think that passing them by value will suffer from performance noticably (compare it to the work the function does in its body). If you pass by value, you can also gain from code analysis, because a by value parameter is local to the function and the optimizer may tell when and when not a load from a data member of the functor can be omitted.
If the functor is stateless, passing it as argument implies no cost at all - the padding byte that the functor takes doesn't have to have any particular value (in the Itanium Abi used by GCC at least). When using references, you always have to pass an address.
The last one (const T&) has the drawback that in C++03 that doesn't work for raw functions, because in C++03 the program is ill-formed if you try to apply const to a function type (and is an SFINAE case). More recent implementations instead ignore const when applied on function types.
The second one (T&) has the obvious drawback that you cannot pass temporary functors.
Long story short, I would generally pass them by value, unless I see a clear benefit in concrete cases.
STL uses the first approach (pass by value)
Sure, the standard libraries pass iterators and functors by value. They are assumed (rightly or wrongly) to be cheap to copy, and this means that if you write an iterator or a functor that is expensive to copy, you might have to find a way to optimize that later.
But that is just for the purposes for which the standard libraries use functors - mostly they're predicates, although there are also things like std::transform. If you're integrating a function, that suggests some kind of mathematics libraries, in which case I suppose you might be much more likely to deal with functions that carry a lot of state. You could for example have a class representing nth order polynomials, with n+1 coefficients as non-static data members.
In that case, a const reference might be better. When using such a functor in standard algorithms like transform, you might wrap it in a little class that performs indirection through a pointer, to ensure that it remains cheap to copy.
Taking a non-const reference is potentially annoying to users, since it stops them passing in temporaries.
Given the context, F is expected to be a "callable object" (something like a free function or a class having a operator() defined)
Now, since a free function name cannot be an L-value, the second version is not suitable for that.
The third assumes F::operator() to be const (but may not be the case, if it requires to alter the state of F)
The first operates on a "own copy", but requires F to be copyable.
None of the three is "universal", but the first is most likely working in the most common cases.

Use of rvalue reference members?

I was wondering what use an rvalue reference member has
class A {
// ...
// Is this one useful?
Foo &&f;
};
Does it have any benefits or drawbacks compared to an lvalue reference member? What is a prime usecase of it?
I've seen one very motivating use case for rvalue reference data members, and it is in the C++0x draft:
template<class... Types>
tuple<Types&&...>
forward_as_tuple(Types&&... t) noexcept;
Effects: Constructs a tuple of
references to the arguments in t
suitable for forwarding as arguments
to a function. Because the result may
contain references to temporary
variables, a program shall ensure that
the return value of this function does
not outlive any of its arguments.
(e.g., the program should typically
not store the result in a named
variable).
Returns: tuple<Types&&...>(std::forward<Types>(t)...)
The tuple has rvalue reference data members when rvalues are used as arguments to forward_as_tuple, and otherwise has lvalue reference data members.
I've found forward_as_tuple subsequently helpful when needing to catch variadic arguments, perfectly forward them packed as a tuple, and re-expand them later at the point of forwarding to a functor. I used forward_as_tuple in this style when implementing an enhanced version of tuple_cat proposed in LWG 1385:
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#1385
According to Stephan T. Lavavej, rvalue reference data members have no use.
[at 31:00] The thing I've seen programmers do when they get hold of rvalue references is that, they start to go a little crazy, because they're so powerful. They start saying "Oh, I'm gonna have rvalue reference data members, I'm gonna have rvalue reference local variables, I'm gonna have rvalue reference return values!" And then they write code like this: [...]
class A {
// ...
// Is this one useful?
Foo &&f;
};
In this specific case, there is no reason to use an rvalue reference. It doesn't buy you anything you couldn't have done before.
But you may want to define data members with parameterized types. std::tuple is going to support lvalue and rvalue reference data members, for example. This way it allows you to codify an expression's value category which might come in handy for "delayed perfect forwarding". The standard draft even includes a function template of the form
template<class Args...>
tuple<Args&&...> pack_arguments(Args&&...args);
But I'm honestly not sure about its usefulness.
Just thinking out loud here, but wouldn't it have a use in functors? The constructor is often used for "currying", binding some parameters in advance, before the actual function call.
So in this context, the class member is just a staging ground (or a manually implemented closure) for the upcoming function call, and I see no reason why a rvalue reference wouldn't be meaningful there.
But in "regular" non-functor classes, I don't see much point.

Some clarification on rvalue references

First: where are std::move and std::forward defined? I know what they do, but I can't find proof that any standard header is required to include them. In gcc44 sometimes std::move is available, and sometimes its not, so a definitive include directive would be useful.
When implementing move semantics, the source is presumably left in an undefined state. Should this state necessarily be a valid state for the object? Obviously, you need to be able to call the object's destructor, and be able to assign to it by whatever means the class exposes. But should other operations be valid? I suppose what I'm asking is, if your class guarantees certain invariants, should you strive to enforce those invariants when the user has said they don't care about them anymore?
Next: when you don't care about move semantics, are there any limitations that would cause a non-const reference to be preferred over an rvalue reference when dealing with function parameters? void function(T&); over void function(T&&); From a caller's perspective, being able to pass functions temporary values is occasionally useful, so it seems as though one should grant that option whenever it is feasible to do so. And rvalue references are themselves lvalues, so you can't inadvertently call a move-constructor instead of a copy-constructor, or something like that. I don't see a downside, but I'm sure there is one.
Which brings me to my final question. You still can not bind temporaries to non-const references. But you can bind them to non-const rvalue references. And you can then pass along that reference as a non-const reference in another function.
void function1(int& r) { r++; }
void function2(int&& r) { function1(r); }
int main() {
function1(5); //bad
function2(5); //good
}
Besides the fact that it doesn't do anything, is there anything wrong with that code? My gut says of course not, since changing rvalue references is kind of the whole point to their existence. And if the passed value is legitimately const, the compiler will catch it and yell at you. But by all appearances, this is a runaround of a mechanism that was presumably put in place for a reason, so I'd just like confirmation that I'm not doing anything foolish.
First: where are std::move and std::forward defined?
See 20.3 Utility components, <utility>.
When implementing move semantics, the source is presumably left in an undefined state. Should this state necessarily be a valid state for the object?
Obviously, the object should still be destructibly. But further than that, I think it's a good idea to be still assignable. The Standard says for objects that satisfy "MoveConstructible" and "MoveAssignable":
[ Note: rv remains a valid object. Its state is unspecified. — end note ]
This would mean, I think, that the object can still participate in any operation that doesn't state any precondition. This includes CopyConstructible, CopyAssignable, Destructible and other things. Notice that this won't require anything for your own objects from a core language perspective. The requirements only take place once you touch Standard library components that state these requirements.
Next: when you don't care about move semantics, are there any limitations that would cause a non-const reference to be preferred over an rvalue reference when dealing with function parameters?
This, unfortunately, crucially depends on whether the parameter is in a function template and uses a template parameter:
void f(int const&); // takes all lvalues and const rvalues
void f(int&&); // can only accept nonconst rvalues
However for a function template
template<typename T> void f(T const&);
template<typename T> void f(T&&);
You can't say that, because the second template will, after being called with an lvalue, have as parameter of the synthesized declaration the type U& for nonconst lvalues (and be a better match), and U const& for const lvalues (and be ambiguous). To my knowledge, there is no partial ordering rule to disambiguate that second ambiguity. However, this is already known.
-- Edit --
Despite that issue report, I don't think that the two templates are ambiguous. Partial ordering will make the first template more specialized, because after taking away the reference modifiers and the const, we will find that both types are the same, and then notice that the first template had a reference to const. The Standard says (14.9.2.4)
If, for a given type, deduction succeeds in both directions (i.e., the types are identical after the transfor-mations above) and if the type from the argument template is more cv-qualified than the type from the parameter template (as described above) that type is considered to be more specialized than the other.
If for each type being considered a given template is at least as specialized for all types and more specialized for some set of types and the other template is not more specialized for any types or is not at least as specialized for any types, then the given template is more specialized than the other template.
This makes the T const& template the winner of partial ordering (and GCC is indeed correct to choose it).
-- Edit End --
Which brings me to my final question. You still can not bind temporaries to non-const references. But you can bind them to non-const rvalue references.
This is nicely explained in this article. The second call using function2 only takes nonconst rvalues. The rest of the program won't notice if they are modified, because they won't be able to access those rvalues afterwards anymore! And the 5 you pass is not a class type, so a hidden temporary is created and then passed to the int&& rvalue reference. The code calling function2 won't be able to access that hidden object here, so it won't notice any change.
A different situation is if you do this one:
SomeComplexObject o;
function2(move(o));
You have explicitly requested that o is moved, so it will be modified according to its move specification. However moving is a logically non-modifying operation (see the article). This means whether you move or not shouldn't be observable from the calling code:
SomeComplexObject o;
moveit(o); // #1
o = foo;
If you erase the line that moves, behavior will still be the same, because it's overwritten anyway. This however means that code that uses the value of o after it has been moved from is bad, because it breaks this implicit contract between moveit and the calling code. Thus, the Standard makes no specification about the concrete value of a moved from container.
where are std::move and std::forward defined?
std::move and std::forward are declared in <utility>. See the synopsis at the beginning of section 20.3[utility].
When implementing move semantics, the source is presumably left in an undefined state.
It of course depends on how you implement the move-constructor and move-assignment operator. If you want to use your objects in standard containers, however, you have to follow the MoveConstructible and MoveAssignable concepts, which says that the object remains valid, but is left in unspecified state, i.e. you definitely can destroy it.
included by utility
Here is the article I read about rvalues.
I can't help you with rest, sorry.