Skipping iterator - c++

I have a sequence of values that I'd like to pass to a function that takes a (iterator begin, iterator end) pair. However, I only want every second element in the original sequence to be processed.
Is there a nice way using Standard-Lib/Boost to create an iterator facade that will allow me to pass in the original sequence? I figured something simple like this would already be in the boost iterators or range libraries, but I didn't find anything.
Or am I missing another completely obvious way to do this? Of course, I know I always have the option of copying the values to another sequence, but that's not what I want to do.
Edit: I know about filter_iterator, but that filters on values - it doesn't change the way the iteration advances.

I think you want boost::adaptors::strided

struct TrueOnEven {
template< typename T >
bool operator()(const T&) { return mCount++ % 2 == 0; }
TrueOnEven() : mCount(0) {}
private:
int mCount;
};
int main() {
std::vector< int > tVec, tOtherVec;
...
typedef boost::filter_iterator< TrueOnEven, int > TakeEvenFilterType;
std::copy(
TakeEvenFilterType(tVec.begin(), tVec.end()),
TakeEvenFilterType(tVec.end(), tVec.end()),
std::back_inserter(tOtherVec));
}
To be honest, this is anything else than nice and intuitive. I wrote a simple "Enumerator" library including lazy integrated queries to avoid hotchpotch like the above. It allows you to write:
Query::From(tVec.begin(), tVec.end())
.Skip<2>()
.ToStlSequence(std::back_inserter(tOtherVec));
where Skip<2> basically instantiates a generalized "Filter" which skips every N-th (in this case every second) element.

Here's Boost's filter iterator. It is exactly what you want.
UPDATE: Sorry, read wrongly-ish. Here's a list of all iterator funkiness in Boost:
http://www.boost.org/doc/libs/1_46_1/libs/iterator/doc/#specialized-adaptors
I think a plain iterator_adaptor with an overloaded operator++ that increments the underlying iterator value twice is all you need.

Related

How to implement something like std::copy_if but apply a function before inserting into a different container

Full disclosure, this may be a hammer and nail situation trying to use STL algorithms when none are needed. I have seen a reappearing pattern in some C++14 code I am working with. We have a container that we iterate through, and if the current element matches some condition, then we copy one of the elements fields to another container.
The pattern is something like:
for (auto it = std::begin(foo); it!=std::end(foo); ++it){
auto x = it->Some_member;
// Note, the check usually uses the field would add to the new container.
if(f(x) && g(x)){
bar.emplace_back(x);
}
}
The idea is almost an accumulate where the function being applied does not always return a value. I can only think of a solutions that either
Require a function for accessing the member your want to accumulate and another function for checking the condition. i.e How to combine std::copy_if and std::transform?
Are worse then the thing I want to replace.
Is this even a good idea?
A quite general solution to your issue would be the following (working example):
#include <iostream>
#include <vector>
using namespace std;
template<typename It, typename MemberType, typename Cond, typename Do>
void process_filtered(It begin, It end, MemberType iterator_traits<It>::value_type::*ptr, Cond condition, Do process)
{
for(It it = begin; it != end; ++it)
{
if(condition((*it).*ptr))
{
process((*it).*ptr);
}
}
}
struct Data
{
int x;
int y;
};
int main()
{
// thanks to iterator_traits, vector could also be an array;
// kudos to #Yakk-AdamNevraumont
vector<Data> lines{{1,2},{4,3},{5,6}};
// filter even numbers from Data::x and output them
process_filtered(std::begin(lines), std::end(lines), &Data::x, [](int n){return n % 2 == 0;}, [](int n){cout << n;});
// output is 4, the only x value that is even
return 0;
}
It does not use STL, that is right, but you merely pass an iterator pair, the member to lookup and two lambdas/functions to it that will first filter and second use the filtered output, respectively.
I like your general solutions but here you do not need to have a lambda that extracts the corresponding attribute.
Clearly, the code can be refined to work with const_iterator but for a general idea, I think, it should be helpful. You could also extend it to have a member function that returns a member attribute instead of a direct member attribute pointer, if you'd like to use this method for encapsulated classes.
Sure. There are a bunch of approaches.
Find a library with transform_if, like boost.
Find a library with transform_range, which takes a transformation and range or container and returns a range with the value transformed. Compose this with copy_if.
Find a library with filter_range like the above. Now, use std::transform with your filtered range.
Find one with both, and compose filtering and transforming in the appropriate order. Now your problem is just copying (std::copy or whatever).
Write your own back-inserter wrapper that transforms while inserting. Use that with std::copy_if.
Write your own range adapters, like 2 3 and/or 4.
Write transform_if.

Why does std::set not have a "contains" member function?

I'm heavily using std::set<int> and often I simply need to check if such a set contains a number or not.
I'd find it natural to write:
if (myset.contains(number))
...
But because of the lack of a contains member, I need to write the cumbersome:
if (myset.find(number) != myset.end())
..
or the not as obvious:
if (myset.count(element) > 0)
..
Is there a reason for this design decision ?
I think it was probably because they were trying to make std::set and std::multiset as similar as possible. (And obviously count has a perfectly sensible meaning for std::multiset.)
Personally I think this was a mistake.
It doesn't look quite so bad if you pretend that count is just a misspelling of contains and write the test as:
if (myset.count(element))
...
It's still a shame though.
To be able to write if (s.contains()), contains() has to return a bool (or a type convertible to bool, which is another story), like binary_search does.
The fundamental reason behind the design decision not to do it this way is that contains() which returns a bool would lose valuable information about where the element is in the collection. find() preserves and returns that information in the form of an iterator, therefore is a better choice for a generic library like STL. This has always been the guiding principle for Alex Stepanov, as he has often explained (for example, here).
As to the count() approach in general, although it's often an okay workaround, the problem with it is that it does more work than a contains() would have to do.
That is not to say that a bool contains() isn't a very nice-to-have or even necessary. A while ago we had a long discussion about this very same issue in the
ISO C++ Standard - Future Proposals group.
It lacks it because nobody added it. Nobody added it because the containers from the STL that the std library incorporated where designed to be minimal in interface. (Note that std::string did not come from the STL in the same way).
If you don't mind some strange syntax, you can fake it:
template<class K>
struct contains_t {
K&& k;
template<class C>
friend bool operator->*( C&& c, contains_t&& ) {
auto range = std::forward<C>(c).equal_range(std::forward<K>(k));
return range.first != range.second;
// faster than:
// return std::forward<C>(c).count( std::forward<K>(k) ) != 0;
// for multi-meows with lots of duplicates
}
};
template<class K>
containts_t<K> contains( K&& k ) {
return {std::forward<K>(k)};
}
use:
if (some_set->*contains(some_element)) {
}
Basically, you can write extension methods for most C++ std types using this technique.
It makes a lot more sense to just do this:
if (some_set.count(some_element)) {
}
but I am amused by the extension method method.
The really sad thing is that writing an efficient contains could be faster on a multimap or multiset, as they just have to find one element, while count has to find each of them and count them.
A multiset containing 1 billion copies of 7 (you know, in case you run out) can have a really slow .count(7), but could have a very fast contains(7).
With the above extension method, we could make it faster for this case by using lower_bound, comparing to end, and then comparing to the element. Doing that for an unordered meow as well as an ordered meow would require fancy SFINAE or container-specific overloads however.
You are looking into particular case and not seeing bigger picture. As stated in documentation std::set meets requirement of AssociativeContainer concept. For that concept it does not make any sense to have contains method, as it is pretty much useless for std::multiset and std::multimap, but count works fine for all of them. Though method contains could be added as an alias for count for std::set, std::map and their hashed versions (like length for size() in std::string ), but looks like library creators did not see real need for it.
Although I don't know why std::set has no contains but count which only ever returns 0 or 1,
you can write a templated contains helper function like this:
template<class Container, class T>
auto contains(const Container& v, const T& x)
-> decltype(v.find(x) != v.end())
{
return v.find(x) != v.end();
}
And use it like this:
if (contains(myset, element)) ...
The true reason for set is a mystery for me, but one possible explanation for this same design in map could be to prevent people from writing inefficient code by accident:
if (myMap.contains("Meaning of universe"))
{
myMap["Meaning of universe"] = 42;
}
Which would result in two map lookups.
Instead, you are forced to get an iterator. This gives you a mental hint that you should reuse the iterator:
auto position = myMap.find("Meaning of universe");
if (position != myMap.cend())
{
position->second = 42;
}
which consumes only one map lookup.
When we realize that set and map are made from the same flesh, we can apply this principle also to set. That is, if we want to act on an item in the set only if it is present in the set, this design can prevent us from writing code as this:
struct Dog
{
std::string name;
void bark();
}
operator <(Dog left, Dog right)
{
return left.name < right.name;
}
std::set<Dog> dogs;
...
if (dogs.contain("Husky"))
{
dogs.find("Husky")->bark();
}
Of course all this is a mere speculation.
Since c++20,
bool contains( const Key& key ) const
is available.
I'd like to point out , as mentioned by Andy, that since C++20 the standard added the contains Member function for maps or set:
bool contains( const Key& key ) const; (since C++20)
Now I'd like to focus my answer regarding performance vs readability.
In term of performance if you compare the two versions:
#include <unordered_map>
#include <string>
using hash_map = std::unordered_map<std::string,std::string>;
hash_map a;
std::string get_cpp20(hash_map& x,std::string str)
{
if(x.contains(str))
return x.at(str);
else
return "";
};
std::string get_cpp17(hash_map& x,std::string str)
{
if(const auto it = x.find(str); it !=x.end())
return it->second;
else
return "";
};
You will find that the cpp20 version takes two calls to std::_Hash_find_last_result while the cpp17 takes only one call.
Now I find myself with many data structure with nested unordered_map.
So you end up with something like this:
using my_nested_map = std::unordered_map<std::string,std::unordered_map<std::string,std::unordered_map<int,std::string>>>;
std::string get_cpp20_nested(my_nested_map& x,std::string level1,std::string level2,int level3)
{
if(x.contains(level1) &&
x.at(level1).contains(level2) &&
x.at(level1).at(level2).contains(level3))
return x.at(level1).at(level2).at(level3);
else
return "";
};
std::string get_cpp17_nested(my_nested_map& x,std::string level1,std::string level2,int level3)
{
if(const auto it_level1=x.find(level1); it_level1!=x.end())
if(const auto it_level2=it_level1->second.find(level2);it_level2!=it_level1->second.end())
if(const auto it_level3=it_level2->second.find(level3);it_level3!=it_level2->second.end())
return it_level3->second;
return "";
};
Now if you have plenty of condition in-between these ifs, using the iterator really is painful, very error prone and unclear, I often find myself looking back at the definition of the map to understand what kind of object was at level 1 or level2, while with the cpp20 version , you see at(level1).at(level2).... and understand immediately what you are dealing with.
So in term of code maintenance/review, contains is a very nice addition.
What about binary_search ?
set <int> set1;
set1.insert(10);
set1.insert(40);
set1.insert(30);
if(std::binary_search(set1.begin(),set1.end(),30))
bool found=true;
contains() has to return a bool. Using C++ 20 compiler I get the following output for the code:
#include<iostream>
#include<map>
using namespace std;
int main()
{
multimap<char,int>mulmap;
mulmap.insert(make_pair('a', 1)); //multiple similar key
mulmap.insert(make_pair('a', 2)); //multiple similar key
mulmap.insert(make_pair('a', 3)); //multiple similar key
mulmap.insert(make_pair('b', 3));
mulmap.insert({'a',4});
mulmap.insert(pair<char,int>('a', 4));
cout<<mulmap.contains('c')<<endl; //Output:0 as it doesn't exist
cout<<mulmap.contains('b')<<endl; //Output:1 as it exist
}
Another reason is that it would give a programmer the false impression that std::set is a set in the math set theory sense. If they implement that, then many other questions would follow: if an std::set has contains() for a value, why doesn't it have it for another set? Where are union(), intersection() and other set operations and predicates?
The answer is, of course, that some of the set operations are already implemented as functions in (std::set_union() etc.) and other are as trivially implemented as contains(). Functions and function objects work better with math abstractions than object members, and they are not limited to the particular container type.
If one need to implement a full math-set functionality, he has not only a choice of underlying container, but also he has a choice of implementation details, e.g., would his theory_union() function work with immutable objects, better suited for functional programming, or would it modify its operands and save memory? Would it be implemented as function object from the start or it'd be better to implement is a C-function, and use std::function<> if needed?
As it is now, std::set is just a container, well-suited for the implementation of set in math sense, but it is nearly as far from being a theoretical set as std::vector from being a theoretical vector.

Iterating over Boost fusion::vector

I'm trying to iterate over a boost::fusion vector using:
typedef typename fusion::result_of::begin<T>::type t_iter;
std::cout << distance(begin(t), end(t)) << std::endl;
for(t_iter it = begin(t); it != end(t); next(it)){
std::cout<<deref(it)<<std::endl;
}
The distance cout statement gives me a finite length (2), however the loop seems to run indefinitely.
Any advice much appreciated!
You can't just iterate a Fusion vector like that, the type for each iterator may be different than the previous one (and usually is). I guess that's why you don't have it = next(it) in your code, it would give a compilation error.
You could use boost::fusion::for_each for this, together with a function object that prints each element to the standard output:
struct print
{
template< typename T >
void operator()( T& v ) const
{
std::cout << v;
}
};
...
boost::fusion::for_each( t, print() );
fusion is a wonderful library, and you should now that it is really different from what you use in every day C++ programs in multiple ways, it merge the power of compile time meta programming with runtime, for that you should now that there is no type that can handle all items in a fusion container. What this means? it means that result_of::begin<T>::type is not always a match of next(it) so you can't use fusion iterators in a for like that.
The obvious problem in your code is that you ignore return value of next and it will cause your code to run forever but you can't use it in it = next(it), since their type may vary!!
So what you should do?? You should use boost::fusion::for_each for that purpose
next doesn't actually advance the iterator, it just returns the next one.
This can be seen in the docs, as the function next takes a constant argument, meaning it can't possibly actually modify the iterator:
template<
typename I
>
typename result_of::next<I>::type next(I const& i);
^^^^^
The problem is that inside the loop you are dereferencing your iterator. When you apply next on it, it means nothing and that's why your loop runs forever.

Obtaining `std::priority_queue` elements in reverse order?

I've written some K-nearest-neighbor query methods which build a list of points that are nearest to a given query point. To maintain that list of neighbors, I use the std::priority_queue such that the top element is the farthest neighbor to the query point. This way I know if I should push the new element that is currently being examined (if at a lesser distance than the current farthest neighbor) and can pop() the farthest element when my priority-queue has more than K elements.
So far, all is well. However, when I output the elements, I would like to order them from the closest to the farthest. Currently, I simply pop all the elements from the priority-queue and put them on the output-container (through an iterator), which results in a sequence of points ordered from farthest to closest, so then, I call std::reverse on the output iterator range.
As a simple example, here is a linear-search that uses the priority-queue (obviously, the actual nearest-neighbor query methods I use are far more complicated):
template <typename DistanceValue,
typename ForwardIterator,
typename OutputIterator,
typename GetDistanceFunction,
typename CompareFunction>
inline
OutputIterator min_dist_linear_search(ForwardIterator first,
ForwardIterator last,
OutputIterator output_first,
GetDistanceFunction distance,
CompareFunction compare,
std::size_t max_neighbors = 1,
DistanceValue radius = std::numeric_limits<DistanceValue>::infinity()) {
if(first == last)
return output_first;
typedef std::priority_queue< std::pair<DistanceValue, ForwardIterator>,
std::vector< std::pair<DistanceValue, ForwardIterator> >,
detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction> > PriorityQueue;
PriorityQueue output_queue = PriorityQueue(detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction>(compare));
for(; first != last; ++first) {
DistanceValue d = distance(*first);
if(!compare(d, radius))
continue;
output_queue.push(std::pair<DistanceValue, ForwardIterator>(d, first));
while(output_queue.size() > max_neighbors)
output_queue.pop();
if(output_queue.size() == max_neighbors)
radius = output_queue.top().first;
};
OutputIterator it = output_first;
while( !output_queue.empty() ) {
*it = *(output_queue.top().second);
output_queue.pop(); ++it;
};
std::reverse(output_first, it);
return it;
};
The above is all dandy except for one thing: it requires the output-iterator type to be bidirectional and essentially be pointing to a pre-allocated container. Now, this practice of storing the output in a range prescribed by some output iterator is great and pretty standard too (e.g. std::copy and other STL algorithms are good examples of that). However, in this case I would like to be able to only require a forward output-iterator type, which would make it possible to use back-inserter iterators like those provided for STL containers and iostreams.
So, this boils down to reversing the priority-queue before dumping its content in the output iterator. So, these are the better options I've been able to come up with:
Create a std::vector, dump the priority-queue content in it, and dump the elements into the output-iterator using a reverse-iterator on the vector.
Replace the std::priority_queue with a sorted container (e.g. std::multimap), and then dump the content into the output-iterator using the appropriate traversal order.
Are there any other reasonable option?
I used to employ a std::multimap in a previous implementation of this algorithm and others, as of my second option above. However, when I switched to std::priority_queue, the performance gain was significant. So, I'd rather not use the second option, as it really seems that using a priority-queue for maintaining the list of neighbors is much better than relying on a sorted array. Btw, I also tried a std::vector that I maintain sorted with std::inplace_merge, which was better than multimap, but didn't match up to the priority-queue.
As for the first option, which is my best option at this point, it just seems wasteful to me to have to do this double transfer of data (queue -> vector -> output). I'm just inclined to think that there must be a simpler way to do this... something that I'm missing..
The first option really isn't that bad in this application (considering the complexity of the algorithm that precedes it), but if there is a trick to avoid this double memory transfer, I'd like to know about it.
Problem solved!
I'm such an idiot... I knew I was missing something obvious. In this case, the std::sort_heap() function. The reference page even has an example that does exactly what I need (and since the std::priority_queue is just implemented in terms of a random-access container and the heap-functions (pop_heap, push_heap, make_heap) it makes no real difference to use these functions directly in-place of the std::priority_queue class). I don't know how I could have missed that.
Anyways, I hope this helps anyone who had the same problem.
One dirty idea, which would nevertheless be guaranteed to work, would be the following:
std::priority_queue<int, std::vector<int>, std::less<int> > queue;
queue.push(3);
queue.push(5);
queue.push(9);
queue.push(2);
// Prints in reverse order.
int* front = const_cast<int*>(&queue.top());
int* back = const_cast<int*>(front + queue.size());
std::sort(front, back);
while (front < back) {
printf("%i ", *front);
++front;
}
It may be noted that the in-place sorting will likely break the queue.
why don't you just specify the opposite comparison function in the declaration:
#include <iostream>
#include <queue>
#include <vector>
#include <functional>
int main() {
std::priority_queue<int, std::vector<int>, std::greater<int> > pq;
pq.push(1);
pq.push(10);
pq.push(15);
std::cout << pq.top() << std::endl;
}

C++ less verbose alternative to passing container.start() and container.end()

I declare a vector<Bla> blaVec and write a function:
template<typename Iterator>
void doSomething(Iterator first, Iterator last) { ... }
Then I call this function on blaVec with:
doSomething(blaVec.begin(), blaVec.end());
However, I really would like something shorter like doSomething(blaVec) but without having to specify vector in function definition. Basically, is there a good standard way to specify just the first iterator or maybe a range of [begin,end] iterators as is done by Boost.Range.
I'm an algorithms guy so I really don't want to get into overly generic complex solutions. Most of my life I wrote functions like this:
void doSomething(vector<int> & bla) { ... }
However, these days, I frequently write doSomething that operates on list and deque and vector so a slightly more generic solution was called for, which is why I went with iterators. But it just seems to be too verbose of a solution. What do you suggest?
doSomething(vector & bla) { ... }
doSomething(Iterator first, Iterator last) { ... }
doSomething(/* some range data structure */) { ... }
If you find that verbose, then you can wrap that with this:
template<typename Container>
void doSomething(Container &c)
{
doSomething(c.begin(), c.end()); //internally call the iterator version.
}
And use this function, instead of iterator version.
Also, you can use iterator version, when you don't want the function to operate on all elements in the container. For example,
doSomething(c.begin(), c.begin() + 5); //operate on first 5 elements
//assuming c.begin()+5 makes sense
Prefer the second one, as it is lot more flexible.
I don't feel it is verbose, but if you really insist, you might want to define a macro for it, eg:
#define FULLITER(a) a.begin(), a.end()
(just make sure a is a simple expression so it's not executed twice.)