Related
Discussion:
Let's say I have a struct/class with an arbitrary number of attributes that I want to use as key to a std::unordered_map e.g.,:
struct Foo {
int i;
double d;
char c;
bool b;
};
I know that I have to define a hasher-functor for it e.g.,:
struct FooHasher {
std::size_t operator()(Foo const &foo) const;
};
And then define my std::unordered_map as:
std::unordered_map<Foo, MyValueType, FooHasher> myMap;
What bothers me though, is how to define the call operator for FooHasher. One way to do it, that I also tend to prefer, is with std::hash. However, there are numerous variations e.g.,:
std::size_t operator()(Foo const &foo) const {
return std::hash<int>()(foo.i) ^
std::hash<double>()(foo.d) ^
std::hash<char>()(foo.c) ^
std::hash<bool>()(foo.b);
}
I've also seen the following scheme:
std::size_t operator()(Foo const &foo) const {
return std::hash<int>()(foo.i) ^
(std::hash<double>()(foo.d) << 1) ^
(std::hash<char>()(foo.c) >> 1) ^
(std::hash<bool>()(foo.b) << 1);
}
I've seen also some people adding the golden ratio:
std::size_t operator()(Foo const &foo) const {
return (std::hash<int>()(foo.i) + 0x9e3779b9) ^
(std::hash<double>()(foo.d) + 0x9e3779b9) ^
(std::hash<char>()(foo.c) + 0x9e3779b9) ^
(std::hash<bool>()(foo.b) + 0x9e3779b9);
}
Questions:
What are they trying to achieve by adding the golden ration or shifting bits in the result of std::hash.
Is there an "official scheme" to std::hash an object with arbitrary number of attributes of fundamental type?
A simple xor is symmetric and behaves badly when fed the "same" value multiple times (hash(a) ^ hash(a) is zero). See here for more details.
This is the question of combining hashes. boost has a hash_combine that is pretty decent. Write a hash combiner, and use it.
There is no "official scheme" to solve this problem.
Myself, I typically write a super-hasher that can take anything and hash it. It hash combines tuples and pairs and collections automatically, where it first hashes the count of elements in the collection, then the elements.
It finds hash(t) via ADL first, and if that fails checks if it has a manually written hash in a helper namespace (used for std containers and types), and if that fails does a std::hash<T>{}(t).
Then my hash for Foo support looks like:
struct Foo {
int i;
double d;
char c;
bool b;
friend auto mytie(Foo const& f) {
return std::tie(f.i, f.d, f.c, f.b);
}
friend std::size_t hash(Foo const& f) {
return hasher::hash(mytie(f));
}
};
where I use mytie to move Foo into a tuple, then use the std::tuple overload of hasher::hash to get the result.
I like the idea of hashes of structurally similar types having the same hash. This lets me act as if my hash is transparent in some cases.
Note that hashing unordered meows in this manner is a bad idea, as an asymmetric hash of an unordered meow may generate spurious misses.
(Meow is the generic name for map and set. Do not ask me why: Ask the STL.)
The standard hash framework is lacking in respect of combining hashes. Combining hashes using xor is sub-optimal.
A better solution is proposed in N3980 "Types Don't Know #".
The main idea is using the same hash function and its state to hash more than one value/element/member.
With that framework your hash function would look:
template <class HashAlgorithm>
void hash_append(HashAlgorithm& h, Foo const& x) noexcept
{
using std::hash_append;
hash_append(h, x.i);
hash_append(h, x.d);
hash_append(h, x.c);
hash_append(h, x.b);
}
And the container:
std::unordered_map<Foo, MyValueType, std::uhash<>> myMap;
In this answer I wrote the C++17 code:
cout << accumulate(cbegin(numbers), cend(numbers), decay_t<decltype(numbers[0])>{});
This received some negative commentary about the nature of C++'s type association, which I'm sad to say that I agree with :(
decay_t<decltype(numbers[0])>{} is a very complex way to get a:
Zero-initialized type of an element of numbers
Is it possible to maintain the association with the type of numbers' elements, but not type like 30 characters to get it?
EDIT:
I've got a lot of answers involving the a wrapper for either accumulate or for extracting the type from numbers[0]. The problem being they require the reader to navigate to a secondary location to read a solution that is no less complex than the initialization code decay_t<decltype(numbers[0])>{}.
The only reason that we have to do more than this: decltype(numbers[0]) Is because the array subscript operator returns a reference:
error: invalid cast of an rvalue expression of type 'int' to type 'int&'
It's interesting that with respect to decltype's argument:
If the name of an object is parenthesized, it is treated as an ordinary lvalue expression
However, decltype((numbers[0])) is still just a reference to an element of numbers. So in the end these answers may be as close as we can come to simplifying this initialization :(
While I would always choose to write a helper function as per #Barry,
if numbers is a standard container, it will export the type value_type, so you can save a little complexity:
cout << accumulate(cbegin(numbers), cend(numbers), decltype(numbers)::value_type());
going further, we could define this template function:
template<class Container, class ElementType = typename Container::value_type>
constexpr auto element_of(const Container&, ElementType v = 0)
{
return v;
}
which gives us this:
cout << accumulate(cbegin(numbers), cend(numbers), element_of(numbers, 0));
Personal preference: I find the decay_t, decltype and declval dance pretty annoying and hard to read.
Instead, I would use an extra level of indirection through a type-trait value_t<It> and zero-initialization through init = R{}
template<class It>
using value_t = typename std::iterator_traits<It>::value_type;
template<class It, class R = value_t<It>>
auto accumulate(It first, It last, R init = R{}) { /* as before */ }
I think the best you can do is just factor this out somewhere:
template <class It, class R = std::decay_t<decltype(*std::declval<It>())>>
R accumulate(It first, It last, R init = 0) {
return std::accumulate(first, last, init);
}
std::cout << accumulate(cbegin(numbers), cend(numbers));
Or more generally:
template <class Range, class T =
std::decay_t<decltype(*adl_begin(std::declval<Range&&>()))>>
T accumulate(Range&& range, T init = 0) {
return std::accumulate(adl_begin(range), adl_end(range), init);
}
cout << accumulate(numbers);
where adl_begin is a version of begin() that accounts for ADL.
Sure, we technically still have all the cruft that you were trying to avoid earlier... but at least now you never have to look at it again?
Below is a sample code of Scheme (correct me if i'm wrong):
(define (translate points delta)
(map
(lambda (x)
(+ x delta)
)
points
)
)
basically it defines a lambda function that add delta to input x, then apply it to each item of points.
I found such feature quite interesting that it omits all the iterators and etc.
Is it possible to do such "map" in C++, in an elegant way?
Update according to the reply:
To be more specific, is there a way to implement such "map" function of Scheme, in C++, so that it could be used elegantly? Maybe a template function named "map" that accept function pointer / functor, and a container?
The closest translation of your code in idiomatic C++ would be using std::transform with a std::back_inserter:
std::vector<point> points{…};
std::vector<point> output;
// optional, may improve performance:
output.reserve(points.size());
auto lambda = [=](point x) { return x + delta; };
std::transform(begin(points), end(points), std::back_inserter(output), lambda);
Here, lambda captures its surrounding scope by value – this is indicated by the [=] prefix. This makes it possible to use delta inside it.
However, for T -> T transformations you would usually use an in-place variant instead of pushing values into a new container:
std::vector<point> points{…};
auto lambda = [=](point x) { return x + delta; };
std::transform(begin(points), end(points), begin(points), lambda);
The C++ version is called std::transform.
There is no predefined exact equivalent... but it's not difficult to write:
template<typename T, typename F>
T mymap(const T& container, F f) {
T result;
for (auto const & x : container) {
result.push_back(f(x));
}
return result;
}
std::vector<int> translate(const std::vector<int>& x, int delta) {
return mymap(x, [=](int x){return x+delta;});
}
Something similar to scheme map is std::transform, but requires you to provide an output iterator of where to store the transformed elements.
The C++ standard library is built around the concept of iterator pairs (for example even for sort you don't pass a container but a pair of iterators). I personally don't think this is such a great idea, but it's the way the language was designed.
From time to time I am feeling the need for a certain kind of iterator (for which I can't make up a good name except the one prefixed to the title of this question).
Suppose we have a function (or function object) that maps an integer to type T. That is, we have a definition of a mathematical sequence, but we don't actually have it stored in memory. I want to make an iterator out of it. The iterator class would look something like this:
template <class F, class T>
class sequence_iterator : public std::iterator<...>
{
int i;
F f;
public:
sequence_iterator (F f, int i = 0):f(f), i(i){}
//operators ==, ++, +, -, etc. will compare, increment, etc. the value of i.
T operator*() const
{
return f(i);
}
};
template <class T, class F>
sequence_iterator<F, T> make_sequence_iterator(F f, int i)
{
return sequence_iterator<F, T>(f, i);
}
Maybe I am being naive, but I personally feel that this iterator would be very useful. For example, suppose I have a function that checks whether a number is prime or not. And I want to count the number of primes in the interval [a,b]. I'd do this;
int identity(int i)
{
return i;
}
count_if(make_sequence_iterator<int>(identity, a), make_sequence_iterator<int>(identity, b), isPrime);
Since I have discovered something that would be useful (at least IMHO) I am definitely positive that it exists in boost or the standard library. I just can't find it. So, is there anything like this in boost?. In the very unlikely event that there actually isn't, then I am going to write one - and in this case I'd like to know your opinion whether or not should I make the iterator_category random_access_iterator_tag. My concern is that this isn't a real RAI, because operator* doesn't return a reference.
Thanks in advance for any help.
boost::counting_iterator and boost::transform_iterator should do the trick:
template <typename I, typename F>
boost::transform_iterator<
F,
boost::counting_iterator<I>>
make_sequence_iterator(I i, F f)
{
return boost::make_transform_iterator(
boost::counting_iterator<I>(i), f);
}
Usage:
std::copy(make_sequence_iterator(0, f), make_sequence_iterator(n, f), out);
I would call this an integer mapping iterator, since it maps a function over a subsequence of the integers. And no, I've never encountered this in Boost or in the STL. I'm not sure why that is, since your idea is very similar to the concept of stream iterators, which also generate elements by calling functions.
Whether you want random access iteration is up to you. I'd try building a forward or bidirectional iterator first, since (e.g.) repeated binary searches over a sequence of integers may be faster if they're generated and stored in one go.
Does the boost::transform_iterator fills your needs? there are several useful iterator adaptors in boost, the doc is here.
I think boost::counting_iterator is what you are looking for, or atleast comes the closest. Is there something you are looking for it doesn't provide? One could do, for example:
std::count_if(boost::counting_iterator<int>(0),
boost::counting_iterator<int>(10),
is_prime); // or whatever ...
In short, it is an iterator over a lazy sequence of consecutive values.
Boost.Utility contains a generator iterator adaptor. An example from the documentation:
#include <iostream>
#include <boost/generator_iterator.hpp>
class my_generator
{
public:
typedef int result_type;
my_generator() : state(0) { }
int operator()() { return ++state; }
private:
int state;
};
int main()
{
my_generator gen;
boost::generator_iterator_generator<my_generator>::type it =
boost::make_generator_iterator(gen);
for (int i = 0; i < 10; ++i, ++it)
std::cout << *it << std::endl;
}
Compare
double average = CalculateAverage(values.begin(), values.end());
with
double average = std::for_each(values.begin(), values.end(), CalculateAverage());
What are the benefits of using a functor over a function? Isn't the first a lot easier to read (even before the implementation is added)?
Assume the functor is defined like this:
class CalculateAverage
{
private:
std::size_t num;
double sum;
public:
CalculateAverage() : num (0) , sum (0)
{
}
void operator () (double elem)
{
num++;
sum += elem;
}
operator double() const
{
return sum / num;
}
};
At least four good reasons:
Separation of concerns
In your particular example, the functor-based approach has the advantage of separating the iteration logic from the average-calculation logic. So you can use your functor in other situations (think about all the other algorithms in the STL), and you can use other functors with for_each.
Parameterisation
You can parameterise a functor more easily. So for instance, you could have a CalculateAverageOfPowers functor that takes the average of the squares, or cubes, etc. of your data, which would be written thus:
class CalculateAverageOfPowers
{
public:
CalculateAverageOfPowers(float p) : acc(0), n(0), p(p) {}
void operator() (float x) { acc += pow(x, p); n++; }
float getAverage() const { return acc / n; }
private:
float acc;
int n;
float p;
};
You could of course do the same thing with a traditional function, but then makes it difficult to use with function pointers, because it has a different prototype to CalculateAverage.
Statefulness
And as functors can be stateful, you could do something like this:
CalculateAverage avg;
avg = std::for_each(dataA.begin(), dataA.end(), avg);
avg = std::for_each(dataB.begin(), dataB.end(), avg);
avg = std::for_each(dataC.begin(), dataC.end(), avg);
to average across a number of different data-sets.
Note that almost all STL algorithms/containers that accept functors require them to be "pure" predicates, i.e. have no observable change in state over time. for_each is a special case in this regard (see e.g. Effective Standard C++ Library - for_each vs. transform).
Performance
Functors can often be inlined by the compiler (the STL is a bunch of templates, after all). Whilst the same is theoretically true of functions, compilers typically won't inline through a function pointer. The canonical example is to compare std::sort vs qsort; the STL version is often 5-10x faster, assuming the comparison predicate itself is simple.
Summary
Of course, it's possible to emulate the first three with traditional functions and pointers, but it becomes a great deal simpler with functors.
Advantages of Functors:
Unlike Functions Functor can have state.
Functor fits into OOP paradigm as compared to functions.
Functor often may be inlined unlike Function pointers
Functor doesn't require vtable and runtime dispatching, and hence more efficient in most cases.
std::for_each is easily the most capricious and least useful of the standard algorithms. It's just a nice wrapper for a loop. However, even it has advantages.
Consider what your first version of CalculateAverage must look like. It will have a loop over the iterators, and then do stuff with each element. What happens if you write that loop incorrectly? Oops; there's a compiler or runtime error. The second version can never have such errors. Yes, it's not a lot of code, but why do we have to write loops so often? Why not just once?
Now, consider real algorithms; the ones that actually do work. Do you want to write std::sort? Or std::find? Or std::nth_element? Do you even know how to implement it in the most efficient way possible? How many times do you want to implement these complex algorithms?
As for ease of reading, that's in the eyes of the beholder. As I said, std::for_each is hardly the first choice for algorithms (especially with C++0x's range-based for syntax). But if you're talking about real algorithms, they're very readable; std::sort sorts a list. Some of the more obscure ones like std::nth_element won't be as familiar, but you can always look it up in your handy C++ reference.
And even std::for_each is perfectly readable once you use Lambda's in C++0x.
•Unlike Functions Functor can have state.
This is very interesting because std::binary_function, std::less and std::equal_to has a template for an operator() that is const. But what if you wanted to print a debug message with the current call count for that object, how would you do it?
Here is template for std::equal_to:
struct equal_to : public binary_function<_Tp, _Tp, bool>
{
bool
operator()(const _Tp& __x, const _Tp& __y) const
{ return __x == __y; }
};
I can think of 3 ways to allow the operator() to be const, and yet change a member variable. But what is the best way? Take this example:
#include <iostream>
#include <string>
#include <algorithm>
#include <functional>
#include <cassert> // assert() MACRO
// functor for comparing two integer's, the quotient when integer division by 10.
// So 50..59 are same, and 60..69 are same.
// Used by std::sort()
struct lessThanByTen: public std::less<int>
{
private:
// data members
int count; // nr of times operator() was called
public:
// default CTOR sets count to 0
lessThanByTen() :
count(0)
{
}
// #override the bool operator() in std::less<int> which simply compares two integers
bool operator() ( const int& arg1, const int& arg2) const
{
// this won't compile, because a const method cannot change a member variable (count)
// ++count;
// Solution 1. this trick allows the const method to change a member variable
++(*(int*)&count);
// Solution 2. this trick also fools the compilers, but is a lot uglier to decipher
++(*(const_cast<int*>(&count)));
// Solution 3. a third way to do same thing:
{
// first, stack copy gets bumped count member variable
int incCount = count+1;
const int *iptr = &count;
// this is now the same as ++count
*(const_cast<int*>(iptr)) = incCount;
}
std::cout << "DEBUG: operator() called " << count << " times.\n";
return (arg1/10) < (arg2/10);
}
};
void test1();
void printArray( const std::string msg, const int nums[], const size_t ASIZE);
int main()
{
test1();
return 0;
}
void test1()
{
// unsorted numbers
int inums[] = {33, 20, 10, 21, 30, 31, 32, 22, };
printArray( "BEFORE SORT", inums, 8 );
// sort by quotient of integer division by 10
std::sort( inums, inums+8, lessThanByTen() );
printArray( "AFTER SORT", inums, 8 );
}
//! #param msg can be "this is a const string" or a std::string because of implicit string(const char *) conversion.
//! print "msg: 1,2,3,...N", where 1..8 are numbers in nums[] array
void printArray( const std::string msg, const int nums[], const size_t ASIZE)
{
std::cout << msg << ": ";
for (size_t inx = 0; inx < ASIZE; ++inx)
{
if (inx > 0)
std::cout << ",";
std::cout << nums[inx];
}
std::cout << "\n";
}
Because all 3 solutions are compiled in, it increments count by 3. Here's the output:
gcc -g -c Main9.cpp
gcc -g Main9.o -o Main9 -lstdc++
./Main9
BEFORE SORT: 33,20,10,21,30,31,32,22
DEBUG: operator() called 3 times.
DEBUG: operator() called 6 times.
DEBUG: operator() called 9 times.
DEBUG: operator() called 12 times.
DEBUG: operator() called 15 times.
DEBUG: operator() called 12 times.
DEBUG: operator() called 15 times.
DEBUG: operator() called 15 times.
DEBUG: operator() called 18 times.
DEBUG: operator() called 18 times.
DEBUG: operator() called 21 times.
DEBUG: operator() called 21 times.
DEBUG: operator() called 24 times.
DEBUG: operator() called 27 times.
DEBUG: operator() called 30 times.
DEBUG: operator() called 33 times.
DEBUG: operator() called 36 times.
AFTER SORT: 10,20,21,22,33,30,31,32
In the first approach the iteration code has to be duplicated in all functions that wants to do something with the collection. The second approach hide the details of iteration.
OOP is keyword here.
http://www.newty.de/fpt/functor.html:
4.1 What are Functors ?
Functors are functions with a state. In C++ you can realize them as a class with one or more private members to store the state and with an overloaded operator () to execute the function. Functors can encapsulate C and C++ function pointers employing the concepts templates and polymorphism. You can build up a list of pointers to member functions of arbitrary classes and call them all through the same interface without bothering about their class or the need of a pointer to an instance. All the functions just have got to have the same return-type and calling parameters. Sometimes functors are also known as closures. You can also use functors to implement callbacks.
You are comparing functions on different level of abstraction.
You can implement CalculateAverage(begin, end) either as:
template<typename Iter>
double CalculateAverage(Iter begin, Iter end)
{
return std::accumulate(begin, end, 0.0, std::plus<double>) / std::distance(begin, end)
}
or you can do it with a for loop
template<typename Iter>
double CalculateAverage(Iter begin, Iter end)
{
double sum = 0;
int count = 0;
for(; begin != end; ++begin) {
sum += *begin;
++count;
}
return sum / count;
}
The former requires you to know more things, but once you know them, is simpler and leaves fewer possibilities for error.
It also only uses two generic components (std::accumulate and std::plus), which is often the case in more complex case too. You can often have a simple, universal functor (or function; plain old function can act as functor) and simply combine it with whatever algorithm you need.