Multiple iterators to a complex range

Multiple iterators to a complex range - c++

I am trying to have multiple iterators to a bit more complex range (using range-v3 library) -- manually implementing a cartesian product, using filter, for_each and yield. However, when I tried to hold multiple iterators to such range, they share a common value. For example:
#include <vector>
#include <iostream>
#include <range/v3/view/for_each.hpp>
#include <range/v3/view/filter.hpp>
int main() {
std::vector<int> data1{1,5,2,7,6};
std::vector<int> data2{1,5,2,7,6};
auto range =
data1
| ranges::v3::view::filter([](int v) { return v%2; })
| ranges::v3::view::for_each([&data2](int v) {
return data2 | ranges::v3::view::for_each([v](int v2) {
return ranges::v3::yield(std::make_pair(v,v2));
});
});
auto it1 = range.begin();
for (auto it2 = range.begin(); it2 != range.end(); ++it2) {
std::cout << "[" << it1->first << "," << it1->second << "] [" << it2->first << "," << it2->second << "]\n";
}
return 0;
}
I expected the iterator it1 to keep pointing at the beginning of the range, while the iterator it2 goes through the whole sequence. To my surprise, it1 is incremented as well! I get the following output:
[1,1] [1,1]
[1,5] [1,5]
[1,2] [1,2]
[1,7] [1,7]
[1,6] [1,6]
[5,1] [5,1]
[5,5] [5,5]
[5,2] [5,2]
[5,7] [5,7]
[5,6] [5,6]
[7,1] [7,1]
[7,5] [7,5]
[7,2] [7,2]
[7,7] [7,7]
[7,6] [7,6]
Why is that?
How can I avoid this?
How can I keep multiple, independent iterators pointing in various locations of the range?
Should I implement a cartesian product in a different way? (that's my previous question)
While it is not reflected in the MCVE above, consider a use case where someone tries to implement something similar to std::max_element - trying to return an iterator to the highest-valued pair in the cross product. While looking for the highest value you need to store an iterator to the current best candidate. It cannot alter while you search, and it would be cumbersome to manage the iterators if you need a copy of the range (as suggested in one of the answers).
Materialising the whole cross product is not an option either, as it requires a lot of memory. After all, the whole point of using ranges with filters and other on-the-fly transformations is to avoid such materialisation.

It seems that the resulting view stores state such that it turns out to be single pass. You can work around that by simply making as many copies of the view as you need:
int main() {
std::vector<int> data1{1,5,2,7,6};
std::vector<int> data2{1,5,2,7,6};
auto range =
data1
| ranges::v3::view::filter([](int v) { return v%2; })
| ranges::v3::view::for_each([&data2](int v) {
return data2 | ranges::v3::view::for_each([v](int v2) {
return ranges::v3::yield(std::make_pair(v,v2));
});
});
auto range1= range; // Copy the view adaptor
auto it1 = range1.begin();
for (auto it2 = range.begin(); it2 != range.end(); ++it2) {
std::cout << "[" << it1->first << "," << it1->second << "] [" << it2->first << "," << it2->second << "]\n";
}
std::cout << '\n';
for (; it1 != range1.end(); ++it1) { // Consume the copied view
std::cout << "[" << it1->first << "," << it1->second << "]\n";
}
return 0;
}
Another option would be materializing the view into a container as mentioned in the comments.
Keeping in mind the aforementioned limitation of single-pass views, it is not really hard to implement a max_element
function that returns an iterator, with the important drawback of having to compute the sequence one time and a half.
Here's a possible implementation:
template <typename InputRange,typename BinaryPred = std::greater<>>
auto my_max_element(InputRange &range1,BinaryPred &&pred = {}) -> decltype(range1.begin()) {
auto range2 = range1;
auto it1 = range1.begin();
std::ptrdiff_t pos = 0L;
for (auto it2 = range2.begin(); it2 != range2.end(); ++it2) {
if (pred(*it2,*it1)) {
ranges::advance(it1,pos); // Computing again the sequence as the iterator advances!
pos = 0L;
}
++pos;
}
return it1;
}

What is goin on here?
The entire problem here originates in the fact that std::max_element requires its arguments to be LecacyForwardIterators while the ranges created by ranges::v3::yield apparently (obviously?) only provide LecacyInputIterators. Unfortunately, the range-v3 docs do not explicitly mention the iterator categories one can expect (at least I haven't found it being mentioned). This would indeed be a huge enhancement as all standard library algorithms do explicitly state what iterator categories they require.
In the particular case of std::max_element you are not the first one to stumble over this counterintuitive requirement of ForwardIterator rather than just InputIterator, see Why does std::max_element require a ForwardIterator? for example. In summary, it does make sense, though, because std::max_element does not (despite the name suggesting it) return the max element, but an iterator to the max element. Hence, it is in particular the multipass guarantee that is missing on InputIterator in order to make std::max_element work with it.
For this reason, many other standard library functions do not work with std::max_element either, e.g. std::istreambuf_iterator which really is a pity: you just cannot get the max element from a file with the existing standard library! You either have to load the entire file into memory first, or you have to use your own max algorithm.
The standard library is simply missing an algorithm that really returns the max element rather than an iterator pointing to the max element. Such an algorithm could work with InputIterators as well. Of course, this can very easily be implemented manually, but still it would be handy to have this given by the standard library. I can only speculate why it doesn't exist. Maybe one reason is, that it would require the value_type to be copy constructable because InputIterator is not required to return references to the elements and it might be in turn counterintuitive for a max algorithm to make a copy...
So, now regarding your actual questions:
Why is this? (i.e. why does your range only return InputIterators?)
Obviously, yield creates the values on the fly. This is by design, it's the very reason why one would want to use yield: to not have to create (and thus store) the range upfront. Hence, I do not see how yield could be implemented in a way that it fulfills the multipass guarantee, especially the second bullet is giving me headaches:
If a and b compare equal (a == b is contextually convertible to true) then either they are both non-dereferenceable or *a and *b are references bound to the same object
Technically, I could imagine that one could implement yield in a way that all iterators created from one range share a common internal storage that is filled on the fly during the first traversal. Then it would be possible for different iterators to give you the same references to underlying objects. But then std::max_element would silently consume O(n²) memory (all elements of your cartesian product). So, in my opinion it's definitely better to not do this and instead make the users materialize the range themselves, so that they are aware of it happening.
How can I avoid this?
Well, as already said by metalfox, you can copy your view which would result in different ranges and thus independent iterators. Still, that wouldn't make std::max_element work. So, given the nature of yield the answer to this question, unfortunately, is: you simply cannot avoid this with yield or any other technique that creates values on the fly.
How can I keep multiple, independent iterators pointing in various locations of the range?
This is related to the previous question. Basically, this question answers itself: If you want to point independent iterators in various locations, these locations have to exist somewhere in memory. So, you need to materialize at least those elements that did once have an iterator pointing to them, which in case of std::max_element means that you have to materialize all of them.
Should I implement a cartesian product in a different way?
I can imagine many different implementations. But none of them will be able to provide both of these properties all together:
return ForwardIterators
require less than O(n²) memory
Technically, it could be possible to implement an iterator that is specialized for the usage with std::max_element, meaning that it keeps only the current max element in memory so that it can be referenced... But this would be somewhat ridiculous, wouldn't it? We cannot expect a general purpose library like range-v3 to come up with such highly specialized iterator categories.
Summary
You are saying
After all, I don't think my use case is such a rare outlier and ranges
are planned to be added to the C++20 standard - so there should be
some reasonable way to achieve this without traps...
I definitely agree that "this is not a rare outlier"! However, that doesn't necessarily imply that "there should be some reasonable way to achieve this without traps". Consider e.g. NP-hard problems. It is not a rare outlier to be facing one. Still, it is impossible (unless P=NP) to solve them in polynomial time. And in your case it is simply not possible to use std::max_element without ForwardIterators. And it is not possible to implement a ForwardIterator (as defined by the standard library) on a cartesian product without consuming O(n²) memory.
For the particular case of std::max_element I would suggest to just implement your own version that returns the max element rather than an iterator pointing to it.
However, if I understand your question correctly your concern is more general and std::max_element is just an example. So, I have to disappoint you. Even with the existing standard library some trivial things are impossible due to incompatible iterator categories (again, std::istreambuf_iterator is an existing example). So, if range-v3 happens to be added, there will just be some more of such examples.
So, finally, my recommendation is to just go with your own algorithms, if possible, and swallow the pill of materializing a view otherwise.

An iterator is a pointer to an element in the vector, in this case, it1 points to the beginning of the vector. And hence, if you are trying to point the iterator to the same location of the vector, they will be the same. However, you can have multiple iterators pointing to different locations of the vector. Hope this answers your question.

Related

Is there an even faster approach than swap-and-pop for erasing from std::vector?

I am asking this as the other relevant questions on SO seem to be either for older versions of the C++ standard, do not mention any form of parallelization, or are focused on keeping the ordering/indexing the same as elements are removed.
I have a vector of potentially hundreds of thousands or millions of elements (which are fairly light structures, around ~20 bytes assuming they're compacted down).
Due to other restrictions, it must be a std::vector and other containers would not work (like std::forward_list), or be even less optimal in other uses.
I recently swapped from simple it = std::erase(it) approach to using pop-and-swap using something like this:
for(int i = 0; i < myVec.size();) {
// Do calculations to determine if element must be removed
// ...
// Remove if needed
if(elementMustBeRemoved) {
myVec[i] = myVec.back();
myVec.pop_back();
} else {
i++;
}
}
This works, and was a significant improvement. It cut the runtime of the method down to ~61% of what it was previously. But I would like to improve this further.
Does C++ have a method to remove many non-consecutive elements from a std::vector efficiently? Like passing a vector of indices to erase() and have C++ do some magic under the hood to minimize movement of data?
If so, I could have threads individually gather indices that must be removed in parallel, and then combine them and pass them to erase().

Take a look at std::remove_if algorithm. You could use it like this:
auto firstToErase = std::remove_if(myVec.begin(), myVec.end(),
[](const & T x){
// Do calculations to determine if element must be removed
// ...
return elementMustBeRemoved;});
myVec.erase(firstToErase, myVec.end());
cppreference says that following code is a possible implementation for remove_if:
template<class ForwardIt, class UnaryPredicate>
ForwardIt remove_if(ForwardIt first, ForwardIt last, UnaryPredicate p)
{
first = std::find_if(first, last, p);
if (first != last)
for(ForwardIt i = first; ++i != last; )
if (!p(*i))
*first++ = std::move(*i);
return first;
}
Instead of swapping with the last element it continuously moves through a container building up a range of elements which should be erased, until this range is at the very end of vector. This looks like a more cache-friendly solution and you might notice some performance improvement on a very big vector.
If you want to experiment with a parallel version, there is a version (4) which allows to specify execution policy.

Or, since C++20 you can type sligthly less and use erase_if.
However, in such case you lose the option to choose execution policy.

Is there an even faster approach than swap-and-pop for erasing from std::vector?
Ever since C++11, the optimal removal of single element from vector without preserving order has been move-and-pop rather than swap-and-pop.
Does C++ have a method to remove many non-consecutive elements from a std::vector efficiently?
The remove-erase (std::erase in C++20) idiom is the most efficient that the standard provides. std::remove_if does preserve order, and if you don't care about that, then a more efficient algorithm may be possible. But standard library does not come with unstable remove out of the box. The algorithm goes as follows:
Find first element to be removed (a)
Find last element to not be removed (b)
Move b to a.
Repeat between a and b until iterators meet.
There is a proposal P0048 to add such algorithm to the standard library, and there is a demo implementation in https://github.com/WG21-SG14/SG14/blob/6c5edd5c34e1adf42e69b25ddc57c17d99224bb4/SG14/algorithm_ext.h#L84

Where will a new element be inserted in a std::set?

I have a loop like this (where mySet is a std::set):
for(auto iter=mySet.begin(); iter!=mySet.end(); ++iter){
if (someCondition){mySet.insert(newElement);}
if (someotherCondition){mySet.insert(anothernewElement);}
}
I am experiencing some strange behavior, and I am asking myself if this could be due to the inserted element being inserted "before" the current iterator position in the loop. Namely, I have an Iteration where both conditions are true, but still the distance
distance(iter, mySet.end())
is only 1, not 2 as I would expect. Is my guess about set behavior right? And more importantly, can I still do what I want to do?
what I'm trying to do is to build "chains" on a hexagonal board beween fields of the same color. I have a set containing all fields of my color, and the conditions check the color of neighboring fields, and if they are of the same color, copy this field to mySet, so the chain.
I am trying to use std::set for this because it allows no fields to be in the chain more than once. Reading the comments so far I fear I need to swich to std::vector, where append() will surely add the element at the end, but then I will run into new problems due to having to think of a way to forbid doubling of elements. I therefore am hoping for advice how to solve this the best way.

Depending on the new element's value, it may be inserted before or after current iterator value. Below is an example of inserting before and after an iterator.
#include <iostream>
#include <set>
int main()
{
std::set<int> s;
s.insert(3);
auto it = s.begin();
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(2); // 2 will be inserted before it
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(5); // 5 will be inserted after it
std::cout << std::distance(it, s.end()) << std::endl; // prints 2
}
Regarding your question in the comments: In my particular case, modifying it while iterating is basically exactly what I want, but of course I need to add averything after the current position; no you can not manually arrange the order of the elements. A new value's order is determined by comparing the new one and existing elements. Below is the quote from cppreference.
std::set is an associative container that contains a sorted set of unique objects of type Key. Sorting is done using the key comparison function Compare. Search, removal, and insertion operations have logarithmic complexity. Sets are usually implemented as red-black trees.
Thus, the implementation of the set will decide where exactly it will be placed.
If you really need to add values after current position, you need to use a different container. For example, simply a vector would be suitable:
it = myvector.insert ( it+1 , 200 ); // +1 to add after it

If you have a small number of items, doing a brute-force check to see if they're inside a vector can actually be faster than checking if they're in a set. This is because vectors tend to have better cache locality than lists.
We can write a function to do this pretty easily:
template<class T>
void insert_unique(std::vector<T>& vect, T const& elem) {
if(std::find(vect.begin(), vect.end(), elem) != vect.end()) {
vect.push_back(elem);
}
}

Is std::remove_if guaranteed to call predicate in order?

Will std::remove_if always call the predicate on each element in order (according to the iterator's order) or could it be called out of order?
Here is a toy example of what I would like to do:
void processVector(std::vector<int> values)
{
values.erase(std::remove_if(values.begin(), values.end(), [](int v)
{
if (v % 2 == 0)
{
std::cout << v << "\n";
return true;
}
return false;
}));
}
I need to process and remove all elements of a vector that meet certain criteria, and erase + remove_if seems perfect for that. However, the processing I will do has side effects, and I need to make sure that processing happens in order (in the toy example, suppose that I want to print the values in the order they appear in the original vector).
Is it safe to assume that my predicate will be called on each item in order?
I assume that C++17's execution policies would disambiguate this, but since C++17 isn't out yet that obviously doesn't help me.
Edit: Also, is this a good idea? Or is there a better way to accomplish this?

The standard makes no guarantees on the order of calling the predicate.
What you ought to use is stable_partition. You partition the sequence based on your predicate. Then you can walk the partitioned sequence to perform whatever "side effect" you wanted to do, since stable_partition ensures the relative order of both sets of data. Then you can erase the elements from the vector.
stable_partition has to be used here because erase_if leaves the contents of the "erased" elements undefined.
In code:
void processVector(std::vector<int> &values)
{
auto it = std::stable_partition(begin(values), end(values), [](int v) {return v % 2 != 0;});
std::for_each(it, end(values), [](int v) {std::cout << v << "\n";});
values.erase(it, end(values));
}

A bit late to the party, but here's my take:
While the order is not specified, it will involve jumping through hoops to implement an order different from first-to-last, due to the following:
The complexity is specified to be "exactly std::distance(first, last) applications of the predicate", which requires visiting each element exactly once.
The iterators are ForwardIterators, which means that they can only be incremented.
[C++17 and above] To prevent parallel processing, one can use the version that accepts an execution policy, and pass std::execution::seq.
Given the above, I believe that a (non-parallel) implementation that follows a different order will be convoluted and have no advantages over the straightforward case.
Source: https://en.cppreference.com/w/cpp/algorithm/remove

They should be processed in order, but it is not guaranteed.
std::remove_if() moves "removed" items to the end of the container, they are not actually removed from the container until erase() is called. Both operations will potentially invalidate existing iterators in a std::vector.

Should I iterate a vector by iterator or by access operator?

I have a vector declared as
std::vector<int> MyVector;
MyVector.push_back(5);
MyVector.push_back(6);
MyVector.push_back(7);
How do should I use it in a for loop?
By iterating it with an iterator?
for (std::vector<int>::iterator it=MyVector.begin(); it!=MyVector.end(); ++it)
{
std::cout << "Vector element (*it): " << *it << std::endl;
}
Or by its access iterator?
for (std::vector<int>::size_type i=0; i<MyVector.size(); i++)
{
std::cout << "Vector element (i) : " << MyVector.at(i) << std::endl;
}
In examples I found on internet both of them are used. Is one of them superior to the other under all conditions? If not, when should I prefer one of them over the other?

The first format is more generic format for iterating over standard library containers so it is more common and intuitive. If you need to change your container then this iterating code remains unimpacted.It will work for every standard library container type, thus it gives you more generic code.
In second format, std::vector::at() checks for the bounds each time it gets called on every iteration, so it may be a little detrimental to performance. This overhead is not present in the first format as there is no bounds checking involved.Note that same is the case with using operator[].
Note the performance lag though is not as much as you will notice it unless you are operating on a huge data.

Using std::vector's [] operator is probably faster because using std::vector::at() inside a for loop checks the vector's size twice (in the for loop and in std::vector::at()'s bounds checking).
The first method can be used in other containers and thus can help you much when you change your container type.
If you use C++11, use range-based loops.

First if you have C++11, use a range-based for:
for (auto i : MyVector)
{
std::cout << i;
}
Or BOOST_FOREACH in C++03:
BOOST_FOREACH(int& i, MyVector)
{
std::cout << i;
}
Or std::copy:
std::copy(MyVector.begin(),
MyVector.end(),
std::ostream_iterator<int>(std::cout, "\n"));
As for, the question at hand, at() checks that the index is within bounds and throws an exception if it isn't. So, do not use it unless you need that extra checking. The first way you have it is standard and works well. Some people are pedantic and even it write it like so:
for (std::vector<int>::iterator it=MyVector.begin(), end = MyVector.end(); it!= end; ++it)
{
std::cout << "Vector element (*it): " << *it << std::endl;
}
In the above I cached the end iterator instead of calling end() each loop. Whether this actually makes a performance difference or not, I don't know.

There is no "one is superior to the other" (except that you almost never
want to use at()—at() is only appropriate if there is
something you can really do to recover from the error). The use of
iterator vs. index is largely one of style, and the message you're
passing. The more idiomatic C++ way of doing things would be the
iterator, but people coming from other backgrounds (for example,
mathematicians) will find indexing more idiomatic.
There are where there is a real distinction:
The iterator idiom will work with other types of containers. This
might be relevant if there is a real possibility that you use other
containers.
The indexing idiom can use a single index for several different
containers. If you're iterating through several vector with the same
size, using the indexing idiom makes it clearer that you're accessing
the same element in each of the vector. (Again, this seems to occur
most often in mathematical applications.)
Finally, any time you're really doing random access, or calculating
the element in any way, using indexes is probably more intuitive. (In
such cases, you probably want to do the calculations in int, only
converting to size_t at the last moment.)

How to get the number of loop when using an iterator, in C++?

I'm working on a aplication where I draw a couple of images, like this:
void TimeSlice::draw(float fX, float fY) {
list<TimeSliceLevel*>::iterator it = levels.begin();
float level_x = x;
float level_y = y;
while(it != levels.end()) {
(*it)->draw(level_x,level_y);
level_y += (*it)->height;
++it;
}
}
Though this is a bit incorrect. I need to position the TimeSliceLevel* on a X.. When I've
got a for(int i = 0; i < slices.size(); ++i) loop, I can use x = i * width. Though I'm using an iterator as I've been told many times that's good programming :> and I'm wondering if the iterator has a "index" number of something which I can use to calculate the new X position? (So it's more a question about using iterators)
Kind regards,
Pollux

They don't, as iterators can be used for other purposes besides looping from the beginning to the end of an ordered, indexed list. You'll need to keep track of an index separately and increment it every pass:
list<TimeSliceLevel*>::iterator it;
int index;
for(it = levels.begin(), index = 0; it != levels.end(); ++it, ++index) {
...
}

No, it doesn't. If you need an integer index, use a for-loop. Despite what some iterator extremists would have you believe, for-loops still have their place in C++ code.

It is possible to go from iterator -> index. There are at least two ways:
Use - for Random access iterators (i.e. i - container.begin())
Use std::distance (i.e. std::distance(containter.begin(), i)). This is a more "generic" solution and will perform identically in the random access iterator case to - thanks to specialization, but will have a terrible performance impact otherwise
However, I would not recommend either of them, as it obfuscates the code (and can be unperformant). Instead as others have said, use an additional counter. There is nothing "wrong" with using indexes when needed, rather preferring iterators is meant to be a guideline to help in writing "generic" code, as then you can apply the algorithm to a different container, or a sub set of the container, etc.

For some iterator types, simply subtract the current iterator from the initial iterator:
index = it - levels.begin()
Since this does not work for std::list iterators, just track the index explicitly with a variable, as mentioned in the above answers. The benefit of using the iterator and the container is not lost. You're adding a requirement that the container doesn't provide.

You would have to write something like
size_t index = 0;
for (list<...>::const_iterator it = y.begin(); it != y.end(); ++it) {
// Do your actions based on `index`
++index;
}
and, well, this is sometimes suitable.
On the other hand, you could refactor (replan) your application so that your actual drawing loop doesn't have to make all those x += something, y += something2, ..., but rather act the following way:
foreach (Level* level, list) {
level->draw(backend);
}
It could sometimes be tricky, but to my mind this approach could save you a lot of time if your application grows to something "big".

You CAN BUT ONLY for random-access iterator. If it's a random access iterator you can subtract your iterator from the begin iterator to obtain the index (without keeping a separate int index variable).
for (vector<int>::const_iterator cit = v.begin(); cit != v.end(); ++cit)
{
cout << "This is element no: " << cit - v.begin() << endl;
}
In your example unfortunately you won't be able to do it, because you are using std::list, which is only a bidirectional iterator. Use std::vector and you can do it like my example.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js