Should I use functions or stateless functors? - c++

These 2 piece of code do same thing. And it will be used in sort function as you can see.
Which is better? I usually write latter one. But I saw some coders do it like former one.
struct val_lessthan : binary_function<pair<string,int>, pair<string, int>, bool>
{
bool operator() (const pair<string,int>& x, const pair<string,int>& y) const
{
return x.second < y.second;
}
} val_lt;
and
bool val_lt(const pair<string,int>& x, const pair<string,int>& y)
{
return x.second < y.second;
}
Will use it like:
std::sort(wordvector.begin(), wordvector.end(), val_lt);

The reason you see some people prefer the first version is that functors can be trivially inlined.
When you pass a functor to std::sort, the functor type is known to the function, and so the exact function to call is also known at compile-time, and can be trivially inlined.
With a plain function, what std::sort really sees is just a function pointer, and at compile-time that says nothing about which function it points to. So that can't be inlined unless the compiler performs some fairly extensive flow analysis to see where the pointer came from in this specific call. And it will certainly do that optimization in a small example like yours, but if the functor/function pointer was passed in as a function argument from somewhere else, for example, or it was read from an intermediate data structure before being passed to std::sort, then, the compiler might not be able to inline the function pointer version, and so it would end up slower.

The first one is called a function object and is useful if you need to pass any context information to the comparison function. The standalone function only gets x and y and doesn't have the opportunity to carry along any context.
In the specific instance above, the two ways of writing the comparison function are roughly equivalent.

I'd probably prefer the first as a rule, but would generally prefer to use a template:
template <class T>
struct val_lessthan : binary_function<pair<pair<T, T>, bool> {
bool operator()(T const &x, T const &y) const {
return x.second < y.second;
}
};
Use of .second limits the degree of genericity, but you still get a little (e.g., if memory serves, boost::tuple provides a .first and .second for tuples of two elements. As a rule, being a template gives a little better assurance that the compiler will be able to generate the code inline, so if you care about efficiency, it might help a little (or it might not, but is unlikely to ever cause any harm).

If you want to be able to also call the function in other part of your code, and not passed as a functor, prefer the function form. For example you would prefer:
if (val_lt(a,b))
{
//...
}
to
if(val_lessthan()(a,b))
{
// ...
}
Otherwise when choosing the functor form, you'd better call with an unnamed functor object. That is:
std::sort(wordvector.begin(), wordvector.end(), val_lesstthan());
instead of:
val_lesstthan named;
std::sort(wordvector.begin(), wordvector.end(), named);
Unnaming parameters and return values easily enables the compiler to perform optimization. It refers to a global concept known as RVO (Return Value Optimization). In that case it will probably free your code from one copy construction.

I'd say, choose the simplest which works for your particular case. In this case, choose second over the first.

Both will be equally fast. Almost negligible difference.
When you use functor, it means the function operator() has three parameters in the code generated by the compiler, first paramater is a pointer to the val_lt object itself, and the second and third parameters are the parameters which you've mentioned in the signature. Something like this:
//the possible code generated by the compiler!
bool operator() (val_lessthan *_this, const pair<string,int>& x, const pair<string,int>& y) const
//^^^^^^^^^^^^^^^^^^^ note this!
{
return x.second < y.second;
}

Related

comparator for sorting a vector contatining pointers to objects of custom class

By this question I am also trying to understand fundamentals of C++, as I am very new to C++. There are many good answers to problem of sorting a vector/list of custom classes, like this. In all of the examples the signature of comparator functions passed to sort are like this:
(const ClassType& obj1, const ClassType& obj2)
Is this signature mandatory for comparator functions? Or we can give some thing like this also:
(ClassType obj1, ClassType obj2)
Assuming I will modify the body of comparator accordingly.
If the first signature is mandatory, then why?
I want to understand reasons behind using const and reference'&'.
What I can think is const is because you don't want the comparator function to be able to modify the element. And reference is so that no multiple copies are created.
How should my signature be if I want to sort a vector which contains pointers to objects of custom class? Like (1) or (2) (see below) or both will work?
vertor to be sorted is of type vector
(1)
(const ClassType*& ptr1, const ClassType*& ptr2)
(2)
(ClassType* ptr1, ClassType* ptr2)
I recommend looking through This Documentation.
It explains that the signature of the compare function must be equivalent to:
bool cmp(const Type1& a, const Type2& b);
Being more precise it then goes on to explain that each parameter needs to be a type that is implicitly convertable from an object that is obtained by dereferencing an iterator to the sort function.
So if your iterator is std::vector<ClassType*>::iterator then your arguments need to be implicitly convertable to ClassType*.
If you are using something relatively small like an int or a pointer then I would accept them by value:
bool cmp(const ClassType* ptr1, const ClassType* ptr2) // this is more efficient
NOTE: I made them pointers to const because a sort function should not modify the values it is sorting.
(ClassType obj1, ClassType obj2)
In most situations this signature will also work, for comparators. The reason it is not used is because you have to realize that this is passing the objects by value, which requires the objects to be copied.
This will be a complete waste. The comparator function does not need to have its own copies of its parameters. All it needs are references to two objects it needs to compare, that's it. Additionally, a comparator function does not need to modify the objects it is comparing. It should not do that. Hence, explicitly using a const reference forces the compiler to issue a compilation error, if the comparator function is coded, in error, to modify the object.
And one situation where this will definitely not work is for classes that have deleted copy constructors. Instances of those classes cannot be copied, at all. You can still emplace them into the containers, but they cannot be copied. But they still can be compared.
const is so you know not to change the values while you're comparing them. Reference is because you don't want to make a copy of the value while you're trying to compare them -- they may not even be copyable.
It should look like your first example -- it's always a reference to the const type of the elements of the vector.
If you have vector, it's always:
T const & left, T const & right
So, if T is a pointer, then the signature for the comparison includes the comparison.
There's nothing really special about the STL. I use it for two main reasons, as a slightly more convenient array (std::vector) and because a balanced binary search tree is a hassle to implement. STL has a standard signature for comparators, so all the algorithms are written to operate on the '<' operation (so they test for equality with if(!( a < b || b < a)) ). They could just as easily have chosen the '>' operation or the C qsort() convention, and you can write your own templated sort routines to do that if you want. However it's easier to use C++ if everything uses the same conventions.
The comparators take const references because a comparator shouldn't modify what it is comparing, and because references are more efficient for objects than passing by value. If you just want to sort integers (rarely you need to sort just raw integers in a real program, though it's often done as an exercise) you can quite possibly write your own sort that passes by value and is a tiny bit faster than the STL sort as a consequence.
You can define the comparator with the following signature:
bool com(ClassType* const & lhs, ClassType* const & rhs);
Note the difference from your first option. (What is needed is a const reference to a ClassType* instead of a reference to a const ClassType*)
The second option should also be good.

C++11 std::set lambda comparison function

I want to create a std::set with a custom comparison function. I could define it as a class with operator(), but I wanted to enjoy the ability to define a lambda where it is used, so I decided to define the lambda function in the initialization list of the constructor of the class which has the std::set as a member. But I can't get the type of the lambda. Before I proceed, here's an example:
class Foo
{
private:
std::set<int, /*???*/> numbers;
public:
Foo () : numbers ([](int x, int y)
{
return x < y;
})
{
}
};
I found two solutions after searching: one, using std::function. Just have the set comparison function type be std::function<bool (int, int)> and pass the lambda exactly like I did. The second solution is to write a make_set function, like std::make_pair.
SOLUTION 1:
class Foo
{
private:
std::set<int, std::function<bool (int, int)> numbers;
public:
Foo () : numbers ([](int x, int y)
{
return x < y;
})
{
}
};
SOLUTION 2:
template <class Key, class Compare>
std::set<Key, Compare> make_set (Compare compare)
{
return std::set<Key, Compare> (compare);
}
The question is, do I have a good reason to prefer one solution over the other? I prefer the first one because it makes use of standard features (make_set is not a standard function), but I wonder: does using std::function make the code (potentially) slower? I mean, does it lower the chance the compiler inlines the comparison function, or it should be smart enough to behave exactly the same like it would it was a lambda function type and not std::function (I know, in this case it can't be a lambda type, but you know, I'm asking in general) ?
(I use GCC, but I'd like to know what popular compilers do in general)
SUMMARY, AFTER I GOT LOTS OF GREAT ANSWERS:
If speed is critical, the best solution is to use an class with operator() aka functor. It's easiest for the compiler to optimize and avoid any indirections.
For easy maintenance and a better general-purpose solution, using C++11 features, use std::function. It's still fast (just a little bit slower than the functor, but it may be negligible) and you can use any function - std::function, lambda, any callable object.
There's also an option to use a function pointer, but if there's no speed issue I think std::function is better (if you use C++11).
There's an option to define the lambda function somewhere else, but then you gain nothing from the comparison function being a lambda expression, since you could as well make it a class with operator() and the location of definition wouldn't be the set construction anyway.
There are more ideas, such as using delegation. If you want a more thorough explanation of all solutions, read the answers :)
It's unlikely that the compiler will be able to inline a std::function call, whereas any compiler that supports lambdas would almost certainly inline the functor version, including if that functor is a lambda not hidden by a std::function.
You could use decltype to get the lambda's comparator type:
#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>
int main()
{
auto comp = [](int x, int y){ return x < y; };
auto set = std::set<int,decltype(comp)>( comp );
set.insert(1);
set.insert(10);
set.insert(1); // Dupe!
set.insert(2);
std::copy( set.begin(), set.end(), std::ostream_iterator<int>(std::cout, "\n") );
}
Which prints:
1
2
10
See it run live on Coliru.
Yes, a std::function introduces nearly unavoidable indirection to your set. While the compiler can always, in theory, figure out that all use of your set's std::function involves calling it on a lambda that is always the exact same lambda, that is both hard and extremely fragile.
Fragile, because before the compiler can prove to itself that all calls to that std::function are actually calls to your lambda, it must prove that no access to your std::set ever sets the std::function to anything but your lambda. Which means it has to track down all possible routes to reach your std::set in all compilation units and prove none of them do it.
This might be possible in some cases, but relatively innocuous changes could break it even if your compiler managed to prove it.
On the other hand, a functor with a stateless operator() has easy to prove behavior, and optimizations involving that are everyday things.
So yes, in practice I'd suspect std::function could be slower. On the other hand, std::function solution is easier to maintain than the make_set one, and exchanging programmer time for program performance is pretty fungible.
make_set has the serious disadvantage that any such set's type must be inferred from the call to make_set. Often a set stores persistent state, and not something you create on the stack then let fall out of scope.
If you created a static or global stateless lambda auto MyComp = [](A const&, A const&)->bool { ... }, you can use the std::set<A, decltype(MyComp)> syntax to create a set that can persist, yet is easy for the compiler to optimize (because all instances of decltype(MyComp) are stateless functors) and inline. I point this out, because you are sticking the set in a struct. (Or does your compiler support
struct Foo {
auto mySet = make_set<int>([](int l, int r){ return l<r; });
};
which I would find surprising!)
Finally, if you are worried about performance, consider that std::unordered_set is much faster (at the cost of being unable to iterate over the contents in order, and having to write/find a good hash), and that a sorted std::vector is better if you have a 2-phase "insert everything" then "query contents repeatedly". Simply stuff it into the vector first, then sort unique erase, then use the free equal_range algorithm.
A stateless lambda (i.e. one with no captures) can decay to a function pointer, so your type could be:
std::set<int, bool (*)(int, int)> numbers;
Otherwise I'd go for the make_set solution. If you won't use a one-line creation function because it's non-standard you're not going to get much code written!
From my experience playing around with the profiler, the best compromise between performance and beauty is to use a custom delegate implementation, such as:
https://codereview.stackexchange.com/questions/14730/impossibly-fast-delegate-in-c11
As the std::function is usually a bit too heavy. I can't comment on your specific circumstances, as I don't know them, though.
If you're determined to have the set as a class member, initializing its comparator at constructor time, then at least one level of indirection is unavoidable. Consider that as far as the compiler knows, you could add another constructor:
Foo () : numbers ([](int x, int y)
{
return x < y;
})
{
}
Foo (char) : numbers ([](int x, int y)
{
return x > y;
})
{
}
Once the you have an object of type Foo, the type of the set doesn't carry information on which constructor initialized its comparator, so to call the correct lambda requires an indirection to the run-time selected lambda operator().
Since you're using captureless lambdas, you could use the function pointer type bool (*)(int, int) as your comparator type, as captureless lambdas have the appropriate conversion function. This would of course involve an indirection through the function pointer.
The difference highly depends on your compiler's optimizations. If it optimizes lambda in a std::function those are equivalent, if not you introduce an indirection in the former that you won't have in the latter.

Passing temporaries as LValues

I'd like to use the following idiom, that I think is non-standard. I have functions which return vectors taking advantage of Return Value Optimization:
vector<T> some_func()
{
...
return vector<T>( /* something */ );
}
Then, I would like to use
vector<T>& some_reference;
std::swap(some_reference, some_func());
but some_func doesn't return a LValue. The above code makes sense, and I found this idiom very useful. However, it is non-standard. VC8 only emits a warning at the highest warning level, but I suspect other compilers may reject it.
My question is: Is there some way to achieve the very same thing I want to do (ie. construct a vector, assign to another, and destroy the old one) which is compliant (and does not use the assignment operator, see below) ?
For classes I write, I usually implement assignment as
class T
{
T(T const&);
void swap(T&);
T& operator=(T x) { this->swap(x); return *this; }
};
which takes advantage of copy elision, and solves my problem. For standard types however, I really would like to use swap since I don't want an useless copy of the temporary.
And since I must use VC8 and produce standard C++, I don't want to hear about C++0x and its rvalue references.
EDIT: Finally, I came up with
typedef <typename T>
void assign(T &x, T y)
{
std::swap(x, y);
}
when I use lvalues, since the compiler is free to optimize the call to the copy constructor if y is temporary, and go with std::swap when I have lvalues. All the classes I use are "required" to implement a non-stupid version of std::swap.
Since std::vector is a class type and member functions can be called on rvalues:
some_func().swap(some_reference);
If you don't want useless copies of temporaries, don't return by value.
Use (shared) pointers, pass function arguments by reference to be filled in, insert iterators, ....
Is there a specific reason why you want to return by value?
The only way I know - within the constraints of the standard - to achieve what you want are to apply the expression templates metaprogramming technique: http://en.wikipedia.org/wiki/Expression_templates Which might or not be easy in your case.

How to std::find using a Compare object?

I am confused about the interface of std::find. Why doesn't it take a Compare object that tells it how to compare two objects?
If I could pass a Compare object I could make the following code work, where I would like to compare by value, instead of just comparing the pointer values directly:
typedef std::vector<std::string*> Vec;
Vec vec;
std::string* s1 = new std::string("foo");
std::string* s2 = new std::string("foo");
vec.push_back(s1);
Vec::const_iterator found = std::find(vec.begin(), vec.end(), s2);
// not found, obviously, because I can't tell it to compare by value
delete s1;
delete s2;
Is the following the recommended way to do it?
template<class T>
struct MyEqualsByVal {
const T& x_;
MyEqualsByVal(const T& x) : x_(x) {}
bool operator()(const T& y) const {
return *x_ == *y;
}
};
// ...
vec.push_back(s1);
Vec::const_iterator found =
std::find_if(vec.begin(), vec.end(),
MyEqualsByVal<std::string*>(s2)); // OK, will find "foo"
find can't be overloaded to take a unary predicate instead of a value, because it's an unconstrained template parameter. So if you called find(first, last, my_predicate), there would be a potential ambiguity whether you want the predicate to be evaluated on each member of the range, or whether you want to find a member of the range that's equal to the predicate itself (it could be a range of predicates, for all the designers of the standard libraries know or care, or the value_type of the iterator could be convertible both to the predicate type, and to its argument_type). Hence the need for find_if to go under a separate name.
find could have been overloaded to take an optional binary predicate, in addition to the value searched for. But capturing values in functors, as you've done, is such a standard technique that I don't think it would be a massive gain: it's certainly never necessary since you can always achieve the same result with find_if.
If you got the find you wanted, you'd still have to write a functor (or use boost), since <functional> doesn't contain anything to dereference a pointer. Your functor would be a little simpler as a binary predicate, though, or you could use a function pointer, so it'd be a modest gain. So I don't know why this isn't provided. Given the copy_if fiasco I'm not sure there's much value in assuming there are always good reasons for algorithms that aren't available :-)
Since your T is a pointer, you may as well store a copy of the pointer in the function object.
Other than that, that is how it is done and there's not a whole lot more to it.
As an aside, it's not a good idea to store bare pointers in a container, unless you are extremely careful with ensuring exception safety, which is almost always more hassle than it's worth.
That's exactly what find_if is for - it takes a predicate that is called to compare elements.

binary_search, find_if and <functional>

std::find_if takes a predicate in one of it's overloaded function. Binders make it possible to write EqualityComparators for user-defined types and use them either for dynamic comparison or static comparison.
In contrast the binary search functions of the standard library take a comparator and a const T& to the value that should be used for comparison. This feels inconsistent to me and could possibly more inefficient as the comparator has to be called with both arguments every time instead of having the constant argument bound to it. While it could be possible to implement std::binary_search in a way to use std::bind this would require all comparators to inherit from std::binary_function. Most code I've seen doesn't do that.
Is there a possible benefit from letting comparators inherit from std::binary_function when using it with algorithms that take a const T& as a value instead of letting me use the binders? Is there a reason for not providing predicate overloads in those functions?
A single-argument predicate version of std::binary_search wouldn't be able to complete in O(log n) time.
Consider the old game "guess the letter I'm thinking of". You could ask: "Is it A?" "Is it B?".. and so on until you reached the letter. That's a linear, or O(n), algorithm. But smarter would be to ask "Is it before M?" "Is it before G?" "Is it before I?" and so on until you get to the letter in question. That's a logarithmic, or O(log n), algorithm.
This is what std::binary_search does, and to do this in needs to be able to distinguish three conditions:
Candidate C is the searched-for item X
Candidate C is greater than X
Candidate C is less than X
A one-argument predicate P(x) says only "x has property P" or "x doesn't have property P". You can't get three results from this boolean function.
A comparator (say, <) lets you get three results by calculating C < X and also X < C. Then you have three possibilities:
!(C < X) && !(X < C) C is equal to X
C < X && !(X < C) C is less than X
!(C < X) && X < C C is greater than X
Note that both X and C get bound to both parameters of < at different times, which is why you can't just bind X to one argument of < and use that.
Edit: thanks to jpalecek for reminding me binary_search uses <, not <=.
Edit edit: thanks to Rob Kennedy for clarification.
They are completely different algorithms: find_if looks linearly for the first item for which the predicate is true, binary_search takes advantage that the range is sorted to test in logarithmic time if a given value is in it.
The predicate for binary_search specifies the function according to which the range is ordered (you'd most likely want to use the same predicate you used for sorting it).
You can't take advantage of the sortedness to search for a value satisfying some completely unrelated predicate (you'd have to use find_if anyway). Note however, that with a sorted range you can do more than just test for existence with lower_bound, upper_bound and equal_range.
The question, what is the purpose of std::binary_function is an interesting one.
All it does is provide typedefs for result_type, first_argument_type and second_argument_type. These would allow the users, given a functor as a template argument, to find out and use these types, e.g
template <class T, class BinaryFunction>
void foo(const T& a, const T& b, BinaryFunction f)
{
//declare a variable to store the result of the function call
typename BinaryFunction::result_type result = f(a, b);
//...
}
However, I think the only place where they are used in the standard library is creating other functor wrappers like bind1st, bind2nd, not1, not2. (If they were used for other purposes, people would yell at you any time you used a function as a functor since it would be an unportable thing to do.)
For example, binary_negate might be implemented as (GCC):
template<typename _Predicate>
class binary_negate
: public binary_function<typename _Predicate::first_argument_type,
typename _Predicate::second_argument_type, bool>
{
protected:
_Predicate _M_pred;
public:
explicit
binary_negate(const _Predicate& __x) : _M_pred(__x) { }
bool
operator()(const typename _Predicate::first_argument_type& __x,
const typename _Predicate::second_argument_type& __y) const
{ return !_M_pred(__x, __y); }
};
Of course, operator() could perhaps just be a template, in which case those typedefs would be unnecessary (any downsides?). There are probably also metaprogramming techniques to find out what the argument types are without requiring the user to typedef them explicitly. I suppose it would somewhat get into the way with the power that C++0x gives - e.g when I'd like to implement a negator for a function of any arity with variadic templates...
(IMO the C++98 functors are a bit too inflexible and primitive compared for example to std::tr1::bind and std::tr1::mem_fn, but probably at the time compiler support for metaprogramming techniques required to make those work was not that good, and perhaps the techniques were still being discovered.)
This is a misunderstanding of the Functor concept in C++.
It has nothing to do with inheritance. The property that makes an object a functor (eligible for passing to any of the algorithms) is validity of the expression object(x) or object(x, y), respectively, regardless whether it is a function pointer or an object with overloaded function call operator. Definitely not inheritance from anything. The same applies for std::bind.
The use of binary functors as comparators comes from the fact that comparators (eg. std::less) are binary functors and it's good to be able to use them directly.
IMHO there would be no gain in providing or using the predicate version you propose (after all, it takes just passing one reference). There would be no (performance) gain in using binders, because it does the same thing as the algorithm (bind would pass the extra argument in lieu of the algorithm).