From time to time I am feeling the need for a certain kind of iterator (for which I can't make up a good name except the one prefixed to the title of this question).
Suppose we have a function (or function object) that maps an integer to type T. That is, we have a definition of a mathematical sequence, but we don't actually have it stored in memory. I want to make an iterator out of it. The iterator class would look something like this:
template <class F, class T>
class sequence_iterator : public std::iterator<...>
{
int i;
F f;
public:
sequence_iterator (F f, int i = 0):f(f), i(i){}
//operators ==, ++, +, -, etc. will compare, increment, etc. the value of i.
T operator*() const
{
return f(i);
}
};
template <class T, class F>
sequence_iterator<F, T> make_sequence_iterator(F f, int i)
{
return sequence_iterator<F, T>(f, i);
}
Maybe I am being naive, but I personally feel that this iterator would be very useful. For example, suppose I have a function that checks whether a number is prime or not. And I want to count the number of primes in the interval [a,b]. I'd do this;
int identity(int i)
{
return i;
}
count_if(make_sequence_iterator<int>(identity, a), make_sequence_iterator<int>(identity, b), isPrime);
Since I have discovered something that would be useful (at least IMHO) I am definitely positive that it exists in boost or the standard library. I just can't find it. So, is there anything like this in boost?. In the very unlikely event that there actually isn't, then I am going to write one - and in this case I'd like to know your opinion whether or not should I make the iterator_category random_access_iterator_tag. My concern is that this isn't a real RAI, because operator* doesn't return a reference.
Thanks in advance for any help.
boost::counting_iterator and boost::transform_iterator should do the trick:
template <typename I, typename F>
boost::transform_iterator<
F,
boost::counting_iterator<I>>
make_sequence_iterator(I i, F f)
{
return boost::make_transform_iterator(
boost::counting_iterator<I>(i), f);
}
Usage:
std::copy(make_sequence_iterator(0, f), make_sequence_iterator(n, f), out);
I would call this an integer mapping iterator, since it maps a function over a subsequence of the integers. And no, I've never encountered this in Boost or in the STL. I'm not sure why that is, since your idea is very similar to the concept of stream iterators, which also generate elements by calling functions.
Whether you want random access iteration is up to you. I'd try building a forward or bidirectional iterator first, since (e.g.) repeated binary searches over a sequence of integers may be faster if they're generated and stored in one go.
Does the boost::transform_iterator fills your needs? there are several useful iterator adaptors in boost, the doc is here.
I think boost::counting_iterator is what you are looking for, or atleast comes the closest. Is there something you are looking for it doesn't provide? One could do, for example:
std::count_if(boost::counting_iterator<int>(0),
boost::counting_iterator<int>(10),
is_prime); // or whatever ...
In short, it is an iterator over a lazy sequence of consecutive values.
Boost.Utility contains a generator iterator adaptor. An example from the documentation:
#include <iostream>
#include <boost/generator_iterator.hpp>
class my_generator
{
public:
typedef int result_type;
my_generator() : state(0) { }
int operator()() { return ++state; }
private:
int state;
};
int main()
{
my_generator gen;
boost::generator_iterator_generator<my_generator>::type it =
boost::make_generator_iterator(gen);
for (int i = 0; i < 10; ++i, ++it)
std::cout << *it << std::endl;
}
Related
Discussion:
Let's say I have a struct/class with an arbitrary number of attributes that I want to use as key to a std::unordered_map e.g.,:
struct Foo {
int i;
double d;
char c;
bool b;
};
I know that I have to define a hasher-functor for it e.g.,:
struct FooHasher {
std::size_t operator()(Foo const &foo) const;
};
And then define my std::unordered_map as:
std::unordered_map<Foo, MyValueType, FooHasher> myMap;
What bothers me though, is how to define the call operator for FooHasher. One way to do it, that I also tend to prefer, is with std::hash. However, there are numerous variations e.g.,:
std::size_t operator()(Foo const &foo) const {
return std::hash<int>()(foo.i) ^
std::hash<double>()(foo.d) ^
std::hash<char>()(foo.c) ^
std::hash<bool>()(foo.b);
}
I've also seen the following scheme:
std::size_t operator()(Foo const &foo) const {
return std::hash<int>()(foo.i) ^
(std::hash<double>()(foo.d) << 1) ^
(std::hash<char>()(foo.c) >> 1) ^
(std::hash<bool>()(foo.b) << 1);
}
I've seen also some people adding the golden ratio:
std::size_t operator()(Foo const &foo) const {
return (std::hash<int>()(foo.i) + 0x9e3779b9) ^
(std::hash<double>()(foo.d) + 0x9e3779b9) ^
(std::hash<char>()(foo.c) + 0x9e3779b9) ^
(std::hash<bool>()(foo.b) + 0x9e3779b9);
}
Questions:
What are they trying to achieve by adding the golden ration or shifting bits in the result of std::hash.
Is there an "official scheme" to std::hash an object with arbitrary number of attributes of fundamental type?
A simple xor is symmetric and behaves badly when fed the "same" value multiple times (hash(a) ^ hash(a) is zero). See here for more details.
This is the question of combining hashes. boost has a hash_combine that is pretty decent. Write a hash combiner, and use it.
There is no "official scheme" to solve this problem.
Myself, I typically write a super-hasher that can take anything and hash it. It hash combines tuples and pairs and collections automatically, where it first hashes the count of elements in the collection, then the elements.
It finds hash(t) via ADL first, and if that fails checks if it has a manually written hash in a helper namespace (used for std containers and types), and if that fails does a std::hash<T>{}(t).
Then my hash for Foo support looks like:
struct Foo {
int i;
double d;
char c;
bool b;
friend auto mytie(Foo const& f) {
return std::tie(f.i, f.d, f.c, f.b);
}
friend std::size_t hash(Foo const& f) {
return hasher::hash(mytie(f));
}
};
where I use mytie to move Foo into a tuple, then use the std::tuple overload of hasher::hash to get the result.
I like the idea of hashes of structurally similar types having the same hash. This lets me act as if my hash is transparent in some cases.
Note that hashing unordered meows in this manner is a bad idea, as an asymmetric hash of an unordered meow may generate spurious misses.
(Meow is the generic name for map and set. Do not ask me why: Ask the STL.)
The standard hash framework is lacking in respect of combining hashes. Combining hashes using xor is sub-optimal.
A better solution is proposed in N3980 "Types Don't Know #".
The main idea is using the same hash function and its state to hash more than one value/element/member.
With that framework your hash function would look:
template <class HashAlgorithm>
void hash_append(HashAlgorithm& h, Foo const& x) noexcept
{
using std::hash_append;
hash_append(h, x.i);
hash_append(h, x.d);
hash_append(h, x.c);
hash_append(h, x.b);
}
And the container:
std::unordered_map<Foo, MyValueType, std::uhash<>> myMap;
Not entirely a question, although just something I have been pondering on how to write such code more elegantly by style and at the same time fully making use of the new c++ standard etc. Here is the example
Returning Fibonacci sequence to a container upto N values (for those not mathematically inclined, this is just adding the previous two values with the first two values equal to 1. i.e. 1,1,2,3,5,8,13, ...)
example run from main:
std::vector<double> vec;
running_fibonacci_seq(vec,30000000);
1)
template <typename T, typename INT_TYPE>
void running_fibonacci_seq(T& coll, const INT_TYPE& N)
{
coll.resize(N);
coll[0] = 1;
if (N>1) {
coll[1] = 1;
for (auto pos = coll.begin()+2;
pos != coll.end();
++pos)
{
*pos = *(pos-1) + *(pos-2);
}
}
}
2) the same but using rvalue && instead of & 1.e.
void running_fibonacci_seq(T&& coll, const INT_TYPE& N)
EDIT: as noticed by the users who commented below, the rvalue and lvalue play no role in timing - the speeds were actually the same for reasons discussed in the comments
results for N = 30,000,000
Time taken for &:919.053ms
Time taken for &&: 800.046ms
Firstly I know this really isn't a question as such, but which of these or which is best modern c++ code? with the rvalue reference (&&) it appears that move semantics are in place and no unnecessary copies are being made which makes a small improvement on time (important for me due to future real-time application development). some specific ''questions'' are
a) passing a container (which was vector in my example) to a function as a parameter is NOT an elegant solution on how rvalue should really be used. is this fact true? if so how would rvalue really show it's light in the above example?
b) coll.resize(N); call and the N=1 case, is there a way to avoid these calls so the user is given a simple interface to only use the function without creating size of vector dynamically. Can template metaprogramming be of use here so the vector is allocated with a particular size at compile time? (i.e. running_fibonacci_seq<30000000>) since the numbers can be large is there any need to use template metaprogramming if so can we use this (link) also
c) Is there an even more elegant method? I have a feeling std::transform function could be used by using lambdas e.g.
void running_fibonacci_seq(T&& coll, const INT_TYPE& N)
{
coll.resize(N);
coll[0] = 1;
coll[1] = 1;
std::transform (coll.begin()+2,
coll.end(), // source
coll.begin(), // destination
[????](????) { // lambda as function object
return ????????;
});
}
[1] http://cpptruths.blogspot.co.uk/2011/07/want-speed-use-constexpr-meta.html
Due to "reference collapsing" this code does NOT use an rvalue reference, or move anything:
template <typename T, typename INT_TYPE>
void running_fibonacci_seq(T&& coll, const INT_TYPE& N);
running_fibonacci_seq(vec,30000000);
All of your questions (and the existing comments) become quite meaningless when you recognize this.
Obvious answer:
std::vector<double> running_fibonacci_seq(uint32_t N);
Why ?
Because of const-ness:
std::vector<double> const result = running_fibonacci_seq(....);
Because of easier invariants:
void running_fibonacci_seq(std::vector<double>& t, uint32_t N) {
// Oh, forgot to clear "t"!
t.push_back(1);
...
}
But what of speed ?
There is an optimization called Return Value Optimization that allows the compiler to omit the copy (and build the result directly in the caller's variable) in a number of cases. It is specifically allowed by the C++ Standard even when the copy/move constructors have side effects.
So, why passing "out" parameters ?
you can only have one return value (sigh)
you may wish the reuse the allocated resources (here the memory buffer of t)
Profile this:
#include <vector>
#include <cstddef>
#include <type_traits>
template <typename Container>
Container generate_fibbonacci_sequence(std::size_t N)
{
Container coll;
coll.resize(N);
coll[0] = 1;
if (N>1) {
coll[1] = 1;
for (auto pos = coll.begin()+2;
pos != coll.end();
++pos)
{
*pos = *(pos-1) + *(pos-2);
}
}
return coll;
}
struct fibbo_maker {
std::size_t N;
fibbo_maker(std::size_t n):N(n) {}
template<typename Container>
operator Container() const {
typedef typename std::remove_reference<Container>::type NRContainer;
typedef typename std::decay<NRContainer>::type VContainer;
return generate_fibbonacci_sequence<VContainer>(N);
}
};
fibbo_maker make_fibbonacci_sequence( std::size_t N ) {
return fibbo_maker(N);
}
int main() {
std::vector<double> tmp = make_fibbonacci_sequence(30000000);
}
the fibbo_maker stuff is just me being clever. But it lets me deduce the type of fibbo sequence you want without you having to repeat it.
I'm writing a class which will be used to perform some calculations on a set of values, with scaling based on a per-value weight. The values and weights are supplied to the class' constructor. The class will be part of an internal library, and so I want to put as few restrictions as possible on the clients data structures - some clients will use vectors of structs or std::pairs, another separate OpenCV matrixes. During development I've taken start/end iterators and relied on the pair mechanism (val = it->first, weight = it->second).
How could this be done better, without too much hassle for the programmer on the other end? Generally, what is considered best practise when having this sort of multi-dimensional input?
Iterators are fine. However, relying on the types having public members called first and second is a pretty big restriction.
In C++0x, access to std::pair members will be unified with the access patterns of std::tuple, via a get function. This would allow you to overload and specialize the get function for arbitrary types:
#include <iostream>
#include <utility>
template <class T>
void print(const T& data)
{
using std::get;
std::cout << get<0>(data) << ' ' << get<1>(data) << '\n';
}
struct Coord
{
int x, y;
};
template <unsigned>
int get(const Coord&);
template <>
int get<0>(const Coord& c) { return c.x; }
template <>
int get<1>(const Coord& c) { return c.y; }
int main()
{
print(std::make_pair(1, 2));
Coord coord = {4, 5};
print(coord);
}
In case your standard library doesn't have get for pair, then boost's tuple library seems to have it.
That situation is pretty much what templates are in the language for.
How to do intersection and union for sets of the type tr1::unordered_set in c++? I can't find much reference about it.
Any reference and code will be highly appreciated. Thank you very much.
Update: I just guessed the tr1::unordered_set should provide the function for intersection, union, difference.. Since that's the basic operation of sets.
Of course I can write a function by myself, but I just wonder if there are built in function from tr1.
Thank you very much.
I see that set_intersection() et al. from the algorithm header won't work as they explicitly require their inputs to be sorted -- guess you ruled them out already.
It occurs to me that the "naive" approach of iterating through hash A and looking up every element in hash B should actually give you near-optimal performance, since successive lookups in hash B will be going to the same hash bucket (assuming that both hashes are using the same hash function). That should give you decent memory locality, even though these buckets are almost certainly implemented as linked lists.
Here's some code for unordered_set_difference(), you can tweak it to make the versions for set union and set difference:
template <typename InIt1, typename InIt2, typename OutIt>
OutIt unordered_set_intersection(InIt1 b1, InIt1 e1, InIt2 b2, InIt2 e2, OutIt out) {
while (!(b1 == e1)) {
if (!(std::find(b2, e2, *b1) == e2)) {
*out = *b1;
++out;
}
++b1;
}
return out;
}
Assuming you have two unordered_sets, x and y, you can put their intersection in z using:
unordered_set_intersection(
x.begin(), x.end(),
y.begin(), y.end(),
inserter(z, z.begin())
);
Unlike bdonlan's answer, this will actually work for any key types, and any combination of container types (although using set_intersection() will of course be faster if the source containers are sorted).
NOTE: If bucket occupancies are high, it's probably faster to copy each hash into a vector, sort them and set_intersection() them there, since searching within a bucket containing n elements is O(n).
There's nothing much to it - for intersect, just go through every element of one and ensure it's in the other. For union, add all items from both input sets.
For example:
void us_isect(std::tr1::unordered_set<int> &out,
const std::tr1::unordered_set<int> &in1,
const std::tr1::unordered_set<int> &in2)
{
out.clear();
if (in2.size() < in1.size()) {
us_isect(out, in2, in1);
return;
}
for (std::tr1::unordered_set<int>::const_iterator it = in1.begin(); it != in1.end(); it++)
{
if (in2.find(*it) != in2.end())
out.insert(*it);
}
}
void us_union(std::tr1::unordered_set<int> &out,
const std::tr1::unordered_set<int> &in1,
const std::tr1::unordered_set<int> &in2)
{
out.clear();
out.insert(in1.begin(), in1.end());
out.insert(in2.begin(), in2.end());
}
based on the previous answer:
C++11 version, if the set supports a fast look up function find()
(return values are moved efficiently)
/** Intersection and union function for unordered containers which support a fast lookup function find()
* Return values are moved by move-semantics, for c++11/c++14 this is efficient, otherwise it results in a copy
*/
namespace unorderedHelpers {
template<typename UnorderedIn1, typename UnorderedIn2,
typename UnorderedOut = UnorderedIn1>
UnorderedOut makeIntersection(const UnorderedIn1 &in1, const UnorderedIn2 &in2)
{
if (in2.size() < in1.size()) {
return makeIntersection<UnorderedIn2,UnorderedIn1,UnorderedOut>(in2, in1);
}
UnorderedOut out;
auto e = in2.end();
for(auto & v : in1)
{
if (in2.find(v) != e){
out.insert(v);
}
}
return out;
}
template<typename UnorderedIn1, typename UnorderedIn2,
typename UnorderedOut = UnorderedIn1>
UnorderedOut makeUnion(const UnorderedIn1 &in1, const UnorderedIn2 &in2)
{
UnorderedOut out;
out.insert(in1.begin(), in1.end());
out.insert(in2.begin(), in2.end());
return out;
}
}
I myself am convinced that in a project I'm working on signed integers are the best choice in the majority of cases, even though the value contained within can never be negative. (Simpler reverse for loops, less chance for bugs, etc., in particular for integers which can only hold values between 0 and, say, 20, anyway.)
The majority of the places where this goes wrong is a simple iteration of a std::vector, often this used to be an array in the past and has been changed to a std::vector later. So these loops generally look like this:
for (int i = 0; i < someVector.size(); ++i) { /* do stuff */ }
Because this pattern is used so often, the amount of compiler warning spam about this comparison between signed and unsigned type tends to hide more useful warnings. Note that we definitely do not have vectors with more then INT_MAX elements, and note that until now we used two ways to fix compiler warning:
for (unsigned i = 0; i < someVector.size(); ++i) { /*do stuff*/ }
This usually works but might silently break if the loop contains any code like 'if (i-1 >= 0) ...', etc.
for (int i = 0; i < static_cast<int>(someVector.size()); ++i) { /*do stuff*/ }
This change does not have any side effects, but it does make the loop a lot less readable. (And it's more typing.)
So I came up with the following idea:
template <typename T> struct vector : public std::vector<T>
{
typedef std::vector<T> base;
int size() const { return base::size(); }
int max_size() const { return base::max_size(); }
int capacity() const { return base::capacity(); }
vector() : base() {}
vector(int n) : base(n) {}
vector(int n, const T& t) : base(n, t) {}
vector(const base& other) : base(other) {}
};
template <typename Key, typename Data> struct map : public std::map<Key, Data>
{
typedef std::map<Key, Data> base;
typedef typename base::key_compare key_compare;
int size() const { return base::size(); }
int max_size() const { return base::max_size(); }
int erase(const Key& k) { return base::erase(k); }
int count(const Key& k) { return base::count(k); }
map() : base() {}
map(const key_compare& comp) : base(comp) {}
template <class InputIterator> map(InputIterator f, InputIterator l) : base(f, l) {}
template <class InputIterator> map(InputIterator f, InputIterator l, const key_compare& comp) : base(f, l, comp) {}
map(const base& other) : base(other) {}
};
// TODO: similar code for other container types
What you see is basically the STL classes with the methods which return size_type overridden to return just 'int'. The constructors are needed because these aren't inherited.
What would you think of this as a developer, if you'd see a solution like this in an existing codebase?
Would you think 'whaa, they're redefining the STL, what a huge WTF!', or would you think this is a nice simple solution to prevent bugs and increase readability. Or maybe you'd rather see we had spent (half) a day or so on changing all these loops to use std::vector<>::iterator?
(In particular if this solution was combined with banning the use of unsigned types for anything but raw data (e.g. unsigned char) and bit masks.)
Don't derive publicly from STL containers. They have nonvirtual destructors which invokes undefined behaviour if anyone deletes one of your objects through a pointer-to base. If you must derive e.g. from a vector, do it privately and expose the parts you need to expose with using declarations.
Here, I'd just use a size_t as the loop variable. It's simple and readable. The poster who commented that using an int index exposes you as a n00b is correct. However, using an iterator to loop over a vector exposes you as a slightly more experienced n00b - one who doesn't realize that the subscript operator for vector is constant time. (vector<T>::size_type is accurate, but needlessly verbose IMO).
While I don't think "use iterators, otherwise you look n00b" is a good solution to the problem, deriving from std::vector appears much worse than that.
First, developers do expect vector to be std:.vector, and map to be std::map. Second, your solution does not scale for other containers, or for other classes/libraries that interact with containers.
Yes, iterators are ugly, iterator loops are not very well readable, and typedefs only cover up the mess. But at least, they do scale, and they are the canonical solution.
My solution? an stl-for-each macro. That is not without problems (mainly, it is a macro, yuck), but it gets across the meaning. It is not as advanced as e.g. this one, but does the job.
I made this community wiki... Please edit it. I don't agree with the advice against "int" anymore. I now see it as not bad.
Yes, i agree with Richard. You should never use 'int' as the counting variable in a loop like those. The following is how you might want to do various loops using indices (althought there is little reason to, occasionally this can be useful).
Forward
for(std::vector<int>::size_type i = 0; i < someVector.size(); i++) {
/* ... */
}
Backward
You can do this, which is perfectly defined behaivor:
for(std::vector<int>::size_type i = someVector.size() - 1;
i != (std::vector<int>::size_type) -1; i--) {
/* ... */
}
Soon, with c++1x (next C++ version) coming along nicely, you can do it like this:
for(auto i = someVector.size() - 1; i != (decltype(i)) -1; i--) {
/* ... */
}
Decrementing below 0 will cause i to wrap around, because it is unsigned.
But unsigned will make bugs slurp in
That should never be an argument to make it the wrong way (using 'int').
Why not use std::size_t above?
The C++ Standard defines in 23.1 p5 Container Requirements, that T::size_type , for T being some Container, that this type is some implementation defined unsigned integral type. Now, using std::size_t for i above will let bugs slurp in silently. If T::size_type is less or greater than std::size_t, then it will overflow i, or not even get up to (std::size_t)-1 if someVector.size() == 0. Likewise, the condition of the loop would have been broken completely.
Definitely use an iterator. Soon you will be able to use the 'auto' type, for better readability (one of your concerns) like this:
for (auto i = someVector.begin();
i != someVector.end();
++i)
Skip the index
The easiest approach is to sidestep the problem by using iterators, range-based for loops, or algorithms:
for (auto it = begin(v); it != end(v); ++it) { ... }
for (const auto &x : v) { ... }
std::for_each(v.begin(), v.end(), ...);
This is a nice solution if you don't actually need the index value. It also handles reverse loops easily.
Use an appropriate unsigned type
Another approach is to use the container's size type.
for (std::vector<T>::size_type i = 0; i < v.size(); ++i) { ... }
You can also use std::size_t (from <cstddef>). There are those who (correctly) point out that std::size_t may not be the same type as std::vector<T>::size_type (though it usually is). You can, however, be assured that the container's size_type will fit in a std::size_t. So everything is fine, unless you use certain styles for reverse loops. My preferred style for a reverse loop is this:
for (std::size_t i = v.size(); i-- > 0; ) { ... }
With this style, you can safely use std::size_t, even if it's a larger type than std::vector<T>::size_type. The style of reverse loops shown in some of the other answers require casting a -1 to exactly the right type and thus cannot use the easier-to-type std::size_t.
Use a signed type (carefully!)
If you really want to use a signed type (or if your style guide practically demands one), like int, then you can use this tiny function template that checks the underlying assumption in debug builds and makes the conversion explicit so that you don't get the compiler warning message:
#include <cassert>
#include <cstddef>
#include <limits>
template <typename ContainerType>
constexpr int size_as_int(const ContainerType &c) {
const auto size = c.size(); // if no auto, use `typename ContainerType::size_type`
assert(size <= static_cast<std::size_t>(std::numeric_limits<int>::max()));
return static_cast<int>(size);
}
Now you can write:
for (int i = 0; i < size_as_int(v); ++i) { ... }
Or reverse loops in the traditional manner:
for (int i = size_as_int(v) - 1; i >= 0; --i) { ... }
The size_as_int trick is only slightly more typing than the loops with the implicit conversions, you get the underlying assumption checked at runtime, you silence the compiler warning with the explicit cast, you get the same speed as non-debug builds because it will almost certainly be inlined, and the optimized object code shouldn't be any larger because the template doesn't do anything the compiler wasn't already doing implicitly.
You're overthinking the problem.
Using a size_t variable is preferable, but if you don't trust your programmers to use unsigned correctly, go with the cast and just deal with the ugliness. Get an intern to change them all and don't worry about it after that. Turn on warnings as errors and no new ones will creep in. Your loops may be "ugly" now, but you can understand that as the consequences of your religious stance on signed versus unsigned.
vector.size() returns a size_t var, so just change int to size_t and it should be fine.
Richard's answer is more correct, except that it's a lot of work for a simple loop.
I notice that people have very different opinions about this subject. I have also an opinion which does not convince others, so it makes sense to search for support by some guru’s, and I found the CPP core guidelines:
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
maintained by Bjarne Stroustrup and Herb Sutter, and their last update, upon which I base the information below, is of April 10, 2022.
Please take a look at the following code rules:
ES.100: Don’t mix signed and unsigned arithmetic
ES.101: Use unsigned types for bit manipulation
ES.102: Use signed types for arithmetic
ES.107: Don’t use unsigned for subscripts, prefer gsl::index
So, supposing that we want to index in a for loop and for some reason the range based for loop is not the appropriate solution, then using an unsigned type is also not the preferred solution. The suggested solution is using gsl::index.
But in case you don’t have gsl around and you don’t want to introduce it, what then?
In that case I would suggest to have a utility template function as suggested by Adrian McCarthy: size_as_int