Why use functors over functions? - c++

Compare
double average = CalculateAverage(values.begin(), values.end());
with
double average = std::for_each(values.begin(), values.end(), CalculateAverage());
What are the benefits of using a functor over a function? Isn't the first a lot easier to read (even before the implementation is added)?
Assume the functor is defined like this:
class CalculateAverage
{
private:
std::size_t num;
double sum;
public:
CalculateAverage() : num (0) , sum (0)
{
}
void operator () (double elem)
{
num++;
sum += elem;
}
operator double() const
{
return sum / num;
}
};

At least four good reasons:
Separation of concerns
In your particular example, the functor-based approach has the advantage of separating the iteration logic from the average-calculation logic. So you can use your functor in other situations (think about all the other algorithms in the STL), and you can use other functors with for_each.
Parameterisation
You can parameterise a functor more easily. So for instance, you could have a CalculateAverageOfPowers functor that takes the average of the squares, or cubes, etc. of your data, which would be written thus:
class CalculateAverageOfPowers
{
public:
CalculateAverageOfPowers(float p) : acc(0), n(0), p(p) {}
void operator() (float x) { acc += pow(x, p); n++; }
float getAverage() const { return acc / n; }
private:
float acc;
int n;
float p;
};
You could of course do the same thing with a traditional function, but then makes it difficult to use with function pointers, because it has a different prototype to CalculateAverage.
Statefulness
And as functors can be stateful, you could do something like this:
CalculateAverage avg;
avg = std::for_each(dataA.begin(), dataA.end(), avg);
avg = std::for_each(dataB.begin(), dataB.end(), avg);
avg = std::for_each(dataC.begin(), dataC.end(), avg);
to average across a number of different data-sets.
Note that almost all STL algorithms/containers that accept functors require them to be "pure" predicates, i.e. have no observable change in state over time. for_each is a special case in this regard (see e.g. Effective Standard C++ Library - for_each vs. transform).
Performance
Functors can often be inlined by the compiler (the STL is a bunch of templates, after all). Whilst the same is theoretically true of functions, compilers typically won't inline through a function pointer. The canonical example is to compare std::sort vs qsort; the STL version is often 5-10x faster, assuming the comparison predicate itself is simple.
Summary
Of course, it's possible to emulate the first three with traditional functions and pointers, but it becomes a great deal simpler with functors.

Advantages of Functors:
Unlike Functions Functor can have state.
Functor fits into OOP paradigm as compared to functions.
Functor often may be inlined unlike Function pointers
Functor doesn't require vtable and runtime dispatching, and hence more efficient in most cases.

std::for_each is easily the most capricious and least useful of the standard algorithms. It's just a nice wrapper for a loop. However, even it has advantages.
Consider what your first version of CalculateAverage must look like. It will have a loop over the iterators, and then do stuff with each element. What happens if you write that loop incorrectly? Oops; there's a compiler or runtime error. The second version can never have such errors. Yes, it's not a lot of code, but why do we have to write loops so often? Why not just once?
Now, consider real algorithms; the ones that actually do work. Do you want to write std::sort? Or std::find? Or std::nth_element? Do you even know how to implement it in the most efficient way possible? How many times do you want to implement these complex algorithms?
As for ease of reading, that's in the eyes of the beholder. As I said, std::for_each is hardly the first choice for algorithms (especially with C++0x's range-based for syntax). But if you're talking about real algorithms, they're very readable; std::sort sorts a list. Some of the more obscure ones like std::nth_element won't be as familiar, but you can always look it up in your handy C++ reference.
And even std::for_each is perfectly readable once you use Lambda's in C++0x.

•Unlike Functions Functor can have state.
This is very interesting because std::binary_function, std::less and std::equal_to has a template for an operator() that is const. But what if you wanted to print a debug message with the current call count for that object, how would you do it?
Here is template for std::equal_to:
struct equal_to : public binary_function<_Tp, _Tp, bool>
{
bool
operator()(const _Tp& __x, const _Tp& __y) const
{ return __x == __y; }
};
I can think of 3 ways to allow the operator() to be const, and yet change a member variable. But what is the best way? Take this example:
#include <iostream>
#include <string>
#include <algorithm>
#include <functional>
#include <cassert> // assert() MACRO
// functor for comparing two integer's, the quotient when integer division by 10.
// So 50..59 are same, and 60..69 are same.
// Used by std::sort()
struct lessThanByTen: public std::less<int>
{
private:
// data members
int count; // nr of times operator() was called
public:
// default CTOR sets count to 0
lessThanByTen() :
count(0)
{
}
// #override the bool operator() in std::less<int> which simply compares two integers
bool operator() ( const int& arg1, const int& arg2) const
{
// this won't compile, because a const method cannot change a member variable (count)
// ++count;
// Solution 1. this trick allows the const method to change a member variable
++(*(int*)&count);
// Solution 2. this trick also fools the compilers, but is a lot uglier to decipher
++(*(const_cast<int*>(&count)));
// Solution 3. a third way to do same thing:
{
// first, stack copy gets bumped count member variable
int incCount = count+1;
const int *iptr = &count;
// this is now the same as ++count
*(const_cast<int*>(iptr)) = incCount;
}
std::cout << "DEBUG: operator() called " << count << " times.\n";
return (arg1/10) < (arg2/10);
}
};
void test1();
void printArray( const std::string msg, const int nums[], const size_t ASIZE);
int main()
{
test1();
return 0;
}
void test1()
{
// unsorted numbers
int inums[] = {33, 20, 10, 21, 30, 31, 32, 22, };
printArray( "BEFORE SORT", inums, 8 );
// sort by quotient of integer division by 10
std::sort( inums, inums+8, lessThanByTen() );
printArray( "AFTER SORT", inums, 8 );
}
//! #param msg can be "this is a const string" or a std::string because of implicit string(const char *) conversion.
//! print "msg: 1,2,3,...N", where 1..8 are numbers in nums[] array
void printArray( const std::string msg, const int nums[], const size_t ASIZE)
{
std::cout << msg << ": ";
for (size_t inx = 0; inx < ASIZE; ++inx)
{
if (inx > 0)
std::cout << ",";
std::cout << nums[inx];
}
std::cout << "\n";
}
Because all 3 solutions are compiled in, it increments count by 3. Here's the output:
gcc -g -c Main9.cpp
gcc -g Main9.o -o Main9 -lstdc++
./Main9
BEFORE SORT: 33,20,10,21,30,31,32,22
DEBUG: operator() called 3 times.
DEBUG: operator() called 6 times.
DEBUG: operator() called 9 times.
DEBUG: operator() called 12 times.
DEBUG: operator() called 15 times.
DEBUG: operator() called 12 times.
DEBUG: operator() called 15 times.
DEBUG: operator() called 15 times.
DEBUG: operator() called 18 times.
DEBUG: operator() called 18 times.
DEBUG: operator() called 21 times.
DEBUG: operator() called 21 times.
DEBUG: operator() called 24 times.
DEBUG: operator() called 27 times.
DEBUG: operator() called 30 times.
DEBUG: operator() called 33 times.
DEBUG: operator() called 36 times.
AFTER SORT: 10,20,21,22,33,30,31,32

In the first approach the iteration code has to be duplicated in all functions that wants to do something with the collection. The second approach hide the details of iteration.

OOP is keyword here.
http://www.newty.de/fpt/functor.html:
4.1 What are Functors ?
Functors are functions with a state. In C++ you can realize them as a class with one or more private members to store the state and with an overloaded operator () to execute the function. Functors can encapsulate C and C++ function pointers employing the concepts templates and polymorphism. You can build up a list of pointers to member functions of arbitrary classes and call them all through the same interface without bothering about their class or the need of a pointer to an instance. All the functions just have got to have the same return-type and calling parameters. Sometimes functors are also known as closures. You can also use functors to implement callbacks.

You are comparing functions on different level of abstraction.
You can implement CalculateAverage(begin, end) either as:
template<typename Iter>
double CalculateAverage(Iter begin, Iter end)
{
return std::accumulate(begin, end, 0.0, std::plus<double>) / std::distance(begin, end)
}
or you can do it with a for loop
template<typename Iter>
double CalculateAverage(Iter begin, Iter end)
{
double sum = 0;
int count = 0;
for(; begin != end; ++begin) {
sum += *begin;
++count;
}
return sum / count;
}
The former requires you to know more things, but once you know them, is simpler and leaves fewer possibilities for error.
It also only uses two generic components (std::accumulate and std::plus), which is often the case in more complex case too. You can often have a simple, universal functor (or function; plain old function can act as functor) and simply combine it with whatever algorithm you need.

Related

Sort or filter object arrays based on data members

Is there a standard method of sorting or filtering an array of objects based on their data members or member functions?
I'm looking for a standard function like getLowestValue in the code bellow:
class Grade
{
public:
Grade() : _grade(0) {}
void setGrade(int i) { _grade = i; }
int getGrade() const { return _grade; }
private:
int _grade;
}
int main()
{
Grade grades[10];
for(int i = 0; i < 10; i++)
grades[i].setGrade(generateRandomNumber());
Grade *lowestGrade = getLowestValue(grades, Grade::getGrade); //???
std::cout << "lowest grade: " << lowestGrade->getGrade() << std::endl;
return 0;
}
To sort you can use std::sort() and to find the minimum std::min_element().
In both cases you will have either to implement the operator< or to create a comparison function.
Example of operator<
inline bool operator< (const Grade& left, const Grade& right){
return left.getGrade() < right.getGrade();
}
Usage of std::min_element():
Grade result = *std::min_element(std::begin(grades), std::end(grades));
Usage of std::sort():
std::sort(std::begin(grades), std::end(grades));
You will have to include: #include <algorithm>
http://en.cppreference.com/w/cpp/algorithm/sort
The standard way to sort anything is to use std::sort() with an comparison function passed to it.
std::sort(grades, grades + 10, [](Grade a, Grade b) {return a.getGrade() < b.getGrade();} );
If comparing between objects is something you'll be doing often, it might be a good idea to implement the operator< in your class. That way, you don't need a comparison function.
"Is there a standard method of sorting or filtering an array of objects based on their data members or member functions?"
There's a number of functions available from the c++ standard Algorithm library. Namely
std::min_element
std::sort
std::remove_if
std::find_if
to realize the functionality you mentioned (Examples of usage are given in the reference pages).
You'll have to provide appropriate comparator functions/classes that operate on your structure members. These can even be lambda functions, if you want to write them on the fly.

Is it good practice to fake an inserter?

We're taught to create function objects to use algorithms.
There are algorithms that call the operator(), like:
for_each
find_if
remove_if
max_element
count_if
These function objects should typically inherit from unary_function or binary_function, to behave like a function, a predicate, etc.
But books don't generally demonstrate examples for creating OutputIterators:
e.g. to traverse the output of functions like
std::set_intersection(), I have to provide a destination container,
and then traverse the result:
std::vector<int> tmp_dest;
std::set_difference (
src1.begin(), src1.end(),
src2.begin(), src2.end(),
std::back_inserter(tmp_dest));
std::for_each( tmp_dest.begin(), tmp_dest.end(), do_something );
int res = std::accumulate( tmp_dest.begin(), tmp_dest.end(), 0 );
but think that it would be more efficient sometimes to use the values of each algorithm, without storing them first, like:
std::set_difference (
src1.begin(), src1.end(),
src2.begin(), src2.end(),
do_something );
Accumulator accumulate(0); // inherits from std::insert_iterator ?
std::set_difference (
src1.begin(), src1.end(),
src2.begin(), src2.end(),
accumulate );
Should we generally create classes like this Accumulator ?
What should its design look like?
What should it inherit from ?
Accumulator could inherit from insert_iterator, but it is not really an iterator (eg it does not implement operator++() )
what are the widely accepted practices?
If you want an output iterator that calls your own function for every value received, use Boost.Iterator's function_output_iterator.
I don't see a fundamental problem with this as long as it's clear to future maintainers how the codes works and what it's doing.
I would probably not inherit such an operation from any standard class (Other than giving it output_iterator_tag). Since we're dealing with templates we don't need to have a parent interface to deal with.
But keep in mind that your statement (eg it does not implement operator++() ) doesn't seem to be correct: Whatever you pass in as the "output iterator" needs to meet the requirements of output iterators which include being copyable, dereference-to-assign, and incrementable. Whatever object type you pass in needs to meet these requirements.
My take on this would be using Boost (also showing Boost Range algorithm versions of set_difference, although off-topic):
#include <set>
#include <boost/range/algorithm.hpp>
#include <boost/function_output_iterator.hpp>
#include <cassert>
void do_something(int) {}
int main()
{
const std::set<int>
src1 { 1,2,3 },
src2 { 1,9 };
unsigned total = 0;
boost::set_difference(src1, src2,
boost::make_function_output_iterator([&](int i)
{
total += i*i;
}));
assert(total == 13); // 2*2 + 3*3
}
See it Live On Coliru
The target of algorithms taking an output iterator is a sequence of values represented by an output iterator. They use iterators for two reasons:
It is quite likely that the result is stored somewhere else, i.e., an iterator is useful.
The protocol mandates that each position is written just once. This is more restictive than a function call interface, i.e., there is an additional guarantee.
For some algorithms both versions, one with a function call interface and one with an iterator interface, are provided. For example, that is the difference between std::for_each() and std::copy().
In any case, if all you need is having a function called where an output iterator is needed, just have the other iterator operations be no-ops and call the function upon assignment to the result of *it: this creates a perfectly valud output iterator.
Following works:
#include <cassert>
#include <algorithm>
class AccumulatorIterator
{
public:
explicit AccumulatorIterator(int initial) : value(initial) {}
AccumulatorIterator& operator = (int rhs) { value += rhs; return *this; }
AccumulatorIterator& operator *() { return *this; }
AccumulatorIterator& operator ++() { return *this; }
operator int() const { return value; }
private:
int value;
};
int main() {
int first[] = {5,10,15,20,25};
int second[] = {50,40,30,20,10};
std::sort(std::begin(first), std::end(first)); // 5 10 15 20 25
std::sort(std::begin(second), std::end(second)); // 10 20 30 40 50
const int res = std::set_intersection (std::begin(first), std::end(first),
std::begin(second), std::end(second), AccumulatorIterator(0));
assert(res == 10 + 20);
return 0;
}

Sort an array of std::pair vs. struct: which one is faster?

I was wondering whether sorting an array of std::pair is faster, or an array of struct?
Here are my code segments:
Code #1: sorting std::pair array (by first element):
#include <algorithm>
pair <int,int> client[100000];
sort(client,client+100000);
Code #2: sort struct (by A):
#include <algorithm>
struct cl{
int A,B;
}
bool cmp(cl x,cl y){
return x.A < y.A;
}
cl clients[100000];
sort(clients,clients+100000,cmp);
code #3: sort struct (by A and internal operator <):
#include <algorithm>
struct cl{
int A,B;
bool operator<(cl x){
return A < x.A;
}
}
cl clients[100000];
sort(clients,clients+100000);
Update: I used these codes to solve a problem in an online Judge. I got time limit of 2 seconds for code #1, and accept for code #2 and #3 (ran in 62 milliseconds). Why code #1 takes so much time in comparison to other codes? Where is the difference?
You know what std::pair is? It's a struct (or class, which is the same thing in C++ for our purposes). So if you want to know what's faster, the usual advice applies: you have to test it and find out for yourself on your platform. But the best bet is that if you implement the equivalent sorting logic to std::pair, you will have equivalent performance, because the compiler does not care whether your data type's name is std::pair or something else.
But note that the code you posted is not equivalent in functionality to the operator < provided for std::pair. Specifically, you only compare the first member, not both. Obviously this may result in some speed gain (but probably not enough to notice in any real program).
I would estimate that there isn't much difference at all between these two solutions.
But like ALL performance related queries, rather than rely on someone on the internet telling they are the same, or one is better than the other, make your own measurements. Sometimes, subtle differences in implementation will make a lot of difference to the actual results.
Having said that, the implementation of std::pair is a struct (or class) with two members, first and second, so I have a hard time imagining that there is any real difference here - you are just implementing your own pair with your own compare function that does exactly the same things that the already existing pair does... Whether it's in an internal function in the class or as an standalone function is unlikely to make much of a difference.
Edit: I made the following "mash the code together":
#include <algorithm>
#include <iostream>
#include <iomanip>
#include <cstdlib>
using namespace std;
const int size=100000000;
pair <int,int> clients1[size];
struct cl1{
int first,second;
};
cl1 clients2[size];
struct cl2{
int first,second;
bool operator<(const cl2 x) const {
return first < x.first;
}
};
cl2 clients3[size];
template<typename T>
void fill(T& t)
{
srand(471117); // Use same random number each time/
for(size_t i = 0; i < sizeof(t) / sizeof(t[0]); i++)
{
t[i].first = rand();
t[i].second = -t[i].first;
}
}
void func1()
{
sort(clients1,clients1+size);
}
bool cmp(cl1 x, cl1 y){
return x.first < y.first;
}
void func2()
{
sort(clients2,clients2+size,cmp);
}
void func3()
{
sort(clients3,clients3+size);
}
void benchmark(void (*f)(), const char *name)
{
cout << "running " << name << endl;
clock_t time = clock();
f();
time = clock() - time;
cout << "Time taken = " << (double)time / CLOCKS_PER_SEC << endl;
}
#define bm(x) benchmark(x, #x)
int main()
{
fill(clients1);
fill(clients2);
fill(clients3);
bm(func1);
bm(func2);
bm(func3);
}
The results are as follows:
running func1
Time taken = 10.39
running func2
Time taken = 14.09
running func3
Time taken = 10.06
I ran the benchmark three times, and they are all within ~0.1s of the above results.
Edit2:
And looking at the code generated, it's quite clear that the "middle" function takes quite a bit longer, since the comparison is made inline for pair and struct cl2, but can't be made inline for struct cl1 - so every compare literally makes a function call, rather than a few instructions inside the functions. This is a large overhead.

Lambda Expression vs Functor in C++

I wonder where should we use lambda expression over functor in C++. To me, these two techniques are basically the same, even functor is more elegant and cleaner than lambda. For example, if I want to reuse my predicate, I have to copy the lambda part over and over. So when does lambda really come in to place?
A lambda expression creates an nameless functor, it's syntactic sugar.
So you mainly use it if it makes your code look better. That generally would occur if either (a) you aren't going to reuse the functor, or (b) you are going to reuse it, but from code so totally unrelated to the current code that in order to share it you'd basically end up creating my_favourite_two_line_functors.h, and have disparate files depend on it.
Pretty much the same conditions under which you would type any line(s) of code, and not abstract that code block into a function.
That said, with range-for statements in C++0x, there are some places where you would have used a functor before where it might well make your code look better now to write the code as a loop body, not a functor or a lambda.
1) It's trivial and trying to share it is more work than benefit.
2) Defining a functor simply adds complexity (due to having to make a bunch of member variables and crap).
If neither of those things is true then maybe you should think about defining a functor.
Edit: it seems to be that you need an example of when it would be nice to use a lambda over a functor. Here you go:
typedef std::vector< std::pair<int,std::string> > whatsit_t;
int find_it(std::string value, whatsit_t const& stuff)
{
auto fit = std::find_if(stuff.begin(), stuff.end(), [value](whatsit_t::value_type const& vt) -> bool { return vt.second == value; });
if (fit == stuff.end()) throw std::wtf_error();
return fit->first;
}
Without lambdas you'd have to use something that similarly constructs a functor on the spot or write an externally linkable functor object for something that's annoyingly trivial.
BTW, I think maybe wtf_error is an extension.
Lambdas are basically just syntactic sugar that implement functors (NB: closures are not simple.) In C++0x, you can use the auto keyword to store lambdas locally, and std::function will enable you to store lambdas, or pass them around in a type-safe manner.
Check out the Wikipedia article on C++0x.
Small functions that are not repeated.
The main complain about functors is that they are not in the same place that they were used. So you had to find and read the functor out of context to the place it was being used in (even if it is only being used in one place).
The other problem was that functor required some wiring to get parameters into the functor object. Not complex but all basic boilerplate code. And boiler plate is susceptible to cut and paste problems.
Lambda try and fix both these. But I would use functors if the function is repeated in multiple places or is larger than (can't think up an appropriate term as it will be context sensitive) small.
lambda and functor have context. Functor is a class and therefore can be more complex then a lambda. A function has no context.
#include <iostream>
#include <list>
#include <vector>
using namespace std;
//Functions have no context, mod is always 3
bool myFunc(int n) { return n % 3 == 0; }
//Functors have context, e.g. _v
//Functors can be more complex, e.g. additional addNum(...) method
class FunctorV
{
public:
FunctorV(int num ) : _v{num} {}
void addNum(int num) { _v.push_back(num); }
bool operator() (int num)
{
for(int i : _v) {
if( num % i == 0)
return true;
}
return false;
}
private:
vector<int> _v;
};
void print(string prefix,list<int>& l)
{
cout << prefix << "l={ ";
for(int i : l)
cout << i << " ";
cout << "}" << endl;
}
int main()
{
list<int> l={1,2,3,4,5,6,7,8,9};
print("initial for each test: ",l);
cout << endl;
//function, so no context.
l.remove_if(myFunc);
print("function mod 3: ",l);
cout << endl;
//nameless lambda, context is x
l={1,2,3,4,5,6,7,8,9};
int x = 3;
l.remove_if([x](int n){ return n % x == 0; });
print("lambda mod x=3: ",l);
x = 4;
l.remove_if([x](int n){ return n % x == 0; });
print("lambda mod x=4: ",l);
cout << endl;
//functor has context and can be more complex
l={1,2,3,4,5,6,7,8,9};
FunctorV myFunctor(3);
myFunctor.addNum(4);
l.remove_if(myFunctor);
print("functor mod v={3,4}: ",l);
return 0;
}
Output:
initial for each test: l={ 1 2 3 4 5 6 7 8 9 }
function mod 3: l={ 1 2 4 5 7 8 }
lambda mod x=3: l={ 1 2 4 5 7 8 }
lambda mod x=4: l={ 1 2 5 7 }
functor mod v={3,4}: l={ 1 2 5 7 }
First, i would like to clear some clutter here.
There are two different things
Lambda function
Lambda expression/functor.
Usually, Lambda expression i.e. [] () {} -> return-type does not always synthesize to closure(i.e. kind of functor). Although this is compiler dependent. But you can force compiler by enforcing + sign before [] as +[] () {} -> return-type. This will create function pointer.
Now, coming to your question. You can use lambda repeatedly as follows:
int main()
{
auto print = [i=0] () mutable {return i++;};
cout<<print()<<endl;
cout<<print()<<endl;
cout<<print()<<endl;
// Call as many time as you want
return 0;
}
You should use Lambda wherever it strikes in your mind considering code expressiveness & easy maintainability like you can use it in custom deleters for smart pointers & with most of the STL algorithms.
If you combine Lambda with other features like constexpr, variadic template parameter pack or generic lambda. You can achieve many things.
You can find more about it here
As you pointed out, it works best when you need a one-off and the coding overhead of writing it out as a function isn't worth it.
Conceptually, the decision of which to use is driven by the same criterion as using a named variable versus a in-place expression or constant...
size_t length = strlen(x) + sizeof(y) + z++ + strlen('\0');
...
allocate(length);
std::cout << length;
...here, creating a length variable encourages the program to consider it's correctness and meaning in isolation of it's later use. The name hopefully conveys enough that it can be understood intuitively and independently of it's initial value. It then allows the value to be used several times without repeating the expression (while handling z being different). While here...
allocate(strlen(x) + sizeof(y) + z++ + strlen('\0'));
...the total code is reduced and the value is localised at the point it's needed. The only thing to "carry forwards" from a reading of this line is the side effects of allocation and increment (z), but there's no extra local variable with scope or later use to consider. The programmer has to mentally juggle less state while continuing their analysis of the code.
The same distinction applies to functions versus inline statements. For the purposes of answering your question, functors versus lambdas can be seen as just a particular case of this function versus inlining decision.
I tend to prefer Functors over Lambdas these days. Although they require more code, Functors yield cleaner algorithms. The below comparison between find_id and find_id2 showcase that result. While both yield sufficiently clean code, find_id2 is slightly easier to read as the MatchName(name) definition is extracted from (and secondary to) the primary algorithm.
I would argue, however, that the Functor code should be placed inside implementation files right above the function definition where it is used to provide direct access to the function definition. Otherwise a Lambda would be better for code-locality/organization.
#include <iostream>
#include <vector>
#include <string>
using namespace std;
struct Person {
int id;
string name;
};
typedef vector<Person> People;
int find_id(string const& name, People const& people) {
auto MatchName = [name](Person const& p) -> bool
{
return p.name == name;
};
auto found = find_if(people.begin(), people.end(), MatchName);
if (found == people.end()) return -1;
return found->id;
}
struct MatchName {
string const& name;
MatchName(string const& name) : name(name) {}
bool operator() (Person const& person)
{
return person.name == name;
}
};
int find_id2(string const& name, People const& people) {
auto found = find_if(people.begin(), people.end(), MatchName(name));
if (found == people.end()) return -1;
return found->id;
}
int main() {
People people { {0, "Jim"}, {1, "Pam"}, {2, "Dwight"} };
cout << "Pam's ID is " << find_id("Pam", people) << endl;
cout << "Dwight's ID is " << find_id2("Dwight", people) << endl;
}
The Functor is self-documenting by default; but Lambda's need to be stored in variables (to be self-documenting) inside more-complex algorithm definitions. Hence, it is preferable to not use Lambda's inline as many people do (for code readability) in order to gain the self-documenting benefit as shown above in the MatchName Lambda.
When a Lambda is stored in a variable at the call-site (or used inline), primary algorithms are slightly more difficult to read. Since Lambdas are secondary in nature to algorithms where they are used, it is preferable to clean up the primary algorithms by using self-documenting subroutines (e.g. Functors). This might not matter as much in this example, but if one wanted to use more complex algorithms it can significantly reduce the burden interpreting code.
Functors can be as simple (as in the example above) or complex as they need to be. Sometimes complexity is desirable and cases for dynamic polymorphism (e.g. for strategy/decorator design patterns; or their template-equivalent policy types). This is a use-case Lambda's can not satisfy.
Functors require explicit declaration of capture variables without polluting primary algorithms. When more-and-more capture variables are required by Lambda's the tendency is to use a blanket-capture like [=]. But this reduces readability greatly as one must mentally jump between the Lambda definition and all surrounding local variables, possibly member variables, and more.

C++ STL - iterate through everything in a sequence

I have a sequence, e.g
std::vector< Foo > someVariable;
and I want a loop which iterates through everything in it.
I could do this:
for (int i=0;i<someVariable.size();i++) {
blah(someVariable[i].x,someVariable[i].y);
woop(someVariable[i].z);
}
or I could do this:
for (std::vector< Foo >::iterator i=someVariable.begin(); i!=someVariable.end(); i++) {
blah(i->x,i->y);
woop(i->z);
}
Both these seem to involve quite a bit of repetition / excessive typing. In an ideal language I'd like to be able to do something like this:
for (i in someVariable) {
blah(i->x,i->y);
woop(i->z);
}
It seems like iterating through everything in a sequence would be an incredibly common operation. Is there a way to do it in which the code isn't twice as long as it should have to be?
You could use for_each from the standard library. You could pass a functor or a function to it. The solution I like is BOOST_FOREACH, which is just like foreach in other languages. C+0x is gonna have one btw.
For example:
#include <iostream>
#include <vector>
#include <algorithm>
#include <boost/foreach.hpp>
#define foreach BOOST_FOREACH
void print(int v)
{
std::cout << v << std::endl;
}
int main()
{
std::vector<int> array;
for(int i = 0; i < 100; ++i)
{
array.push_back(i);
}
std::for_each(array.begin(), array.end(), print); // using STL
foreach(int v, array) // using Boost
{
std::cout << v << std::endl;
}
}
Not counting BOOST_FOREACH which AraK already suggested, you have the following two options in C++ today:
void function(Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
}
std::for_each(someVariable.begin(), someVariable.end(), function);
struct functor {
void operator()(Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
}
};
std::for_each(someVariable.begin(), someVariable.end(), functor());
Both require you to specify the "body" of the loop elsewhere, either as a function or as a functor (a class which overloads operator()). That might be a good thing (if you need to do the same thing in multiple loops, you only have to define the function once), but it can be a bit tedious too. The function version may be a bit less efficient, because the compiler is generally unable to inline the function call. (A function pointer is passed as the third argument, and the compiler has to do some more detailed analysis to determine which function it points to)
The functor version is basically zero overhead. Because an object of type functor is passed to for_each, the compiler knows exactly which function to call: functor::operator(), and so it can be trivially inlined and will be just as efficient as your original loop.
C++0x will introduce lambda expressions which make a third form possible.
std::for_each(someVariable.begin(), someVariable.end(), [](Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
});
Finally, it will also introduce a range-based for loop:
for(Foo& arg : my_someVariable)
{
blah(arg.x, arg.y);
woop(arg.z);
}
So if you've got access to a compiler which supports subsets of C++0x, you might be able to use one or both of the last forms. Otherwise, the idiomatic solution (without using Boost) is to use for_eachlike in one of the two first examples.
By the way, MSVS 2008 has a "for each" C++ keyword. Look at How to: Iterate Over STL Collection with for each.
int main() {
int retval = 0;
vector<int> col(3);
col[0] = 10;
col[1] = 20;
col[2] = 30;
for each( const int& c in col )
retval += c;
cout << "retval: " << retval << endl;
}
Prefer algorithm calls to hand-written loops
There are three reasons:
1) Efficiency: Algorithms are often more efficient than the loops programmers produce
2) Correctness: Writing loops is more subject to errors than is calling algorithms.
3) Maintainability: Algorithm calls often yield code that is clearer and more
straightforward than the corresponding explicit loops.
Prefer almost every other algorithm to for_each()
There are two reasons:
for_each is extremely general, telling you nothing about what's really being done, just that you're doing something to all the items in a sequence.
A more specialized algorithm will often be simpler and more direct
Consider, an example from an earlier reply:
void print(int v)
{
std::cout << v << std::endl;
}
// ...
std::for_each(array.begin(), array.end(), print); // using STL
Using std::copy instead, that whole thing turns into:
std::copy(array.begin(), array.end(), std::ostream_iterator(std::cout, "\n"));
"struct functor {
void operator()(Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
}
};
std::for_each(someVariable.begin(), someVariable.end(), functor());"
I think approaches like these are often needlessly baroque for a simple problem.
do i=1,N
call blah( X(i),Y(i) )
call woop( Z(i) )
end do
is perfectly clear, even if it's 40 years old (and not C++, obviously).
If the container is always a vector (STL name), I see nothing wrong with an index and nothing wrong with calling that index an integer.
In practice, often one needs to iterate over multiple containers of the same size simultaneously and peel off a datum from each, and do something with the lot of them. In that situation, especially, why not use the index?
As far as SSS's points #2 and #3 above, I'd say it could be so for complex cases, but often iterating 1...N is often as simple and clear as anything else.
If you had to explain the algorithm on the whiteboard, could you do it faster with, or without, using 'i'? I think if your meatspace explanation is clearer with the index, use it in codespace.
Save the heavy C++ firepower for the hard targets.