The relationship between iterators and containers in STL - c++

Good Day,
Assume that I am writing a Python-like range in C++. It provides all the characteristics of Random Access containers(Immutable of course). A question is raised in my mind about the following situation:
I have two different iterators, that point to different instances of the range container. The thing is that these two ranges are equal. i.e. they represent the same range. Would you allow the following situation:
fact: range1 == range2 e.g.
---------------------------
range range1(10, 20, 1), range2((10, 20, 1);
range::iterator i = range1.begin(), j = range2.begin();
assert(i == j); // would you allow this?
Sorry if I am missing a simple design rule in STL :)

By default, in the STL, two iterators from two different container are not comparable. This means, the behavior is unspecified. So you do whatever you want, nobody should even try.
edit
After looking carefully at the standard, section 24.1, paragraph 6 states:
An iterator j is called reachable from
an iterator i if and only if there is
a finite sequence of applications of
the expression ++i that makes i == j.
If j is reachable from i, they refer
to the same container.
Which means that if you allow i == j with i and j in two different container, you really consider both container as being the same. As they are immutable, this is perfectly fine. Just a question of semantic.

You might want to check boost::counting_iterator. Combined with boost::iterator_range you'll get something analogous to your range class (except that it will only allow a step-size of 1):
auto rng = boost::make_iterator_range(boost::make_counting_iterator(0),
boost::make_counting_iterator(10));
for(auto it = rng.begin(), e = rng.end(); it != e; ++it)
std::cout << it << " "; // Prints 0,1,2,3,...,9
For this class two iterators are considered equal provided that they contain the same number. But admittedly the situation is different than yours because here each iterator doesn't know to which range it belongs.

In STL the comparison rules are driven by the container's elements and not the container itself, so in my opinion you shouldn't be performing the dereference your self in your == operator overload.

Related

Multiple iterators to a complex range

I am trying to have multiple iterators to a bit more complex range (using range-v3 library) -- manually implementing a cartesian product, using filter, for_each and yield. However, when I tried to hold multiple iterators to such range, they share a common value. For example:
#include <vector>
#include <iostream>
#include <range/v3/view/for_each.hpp>
#include <range/v3/view/filter.hpp>
int main() {
std::vector<int> data1{1,5,2,7,6};
std::vector<int> data2{1,5,2,7,6};
auto range =
data1
| ranges::v3::view::filter([](int v) { return v%2; })
| ranges::v3::view::for_each([&data2](int v) {
return data2 | ranges::v3::view::for_each([v](int v2) {
return ranges::v3::yield(std::make_pair(v,v2));
});
});
auto it1 = range.begin();
for (auto it2 = range.begin(); it2 != range.end(); ++it2) {
std::cout << "[" << it1->first << "," << it1->second << "] [" << it2->first << "," << it2->second << "]\n";
}
return 0;
}
I expected the iterator it1 to keep pointing at the beginning of the range, while the iterator it2 goes through the whole sequence. To my surprise, it1 is incremented as well! I get the following output:
[1,1] [1,1]
[1,5] [1,5]
[1,2] [1,2]
[1,7] [1,7]
[1,6] [1,6]
[5,1] [5,1]
[5,5] [5,5]
[5,2] [5,2]
[5,7] [5,7]
[5,6] [5,6]
[7,1] [7,1]
[7,5] [7,5]
[7,2] [7,2]
[7,7] [7,7]
[7,6] [7,6]
Why is that?
How can I avoid this?
How can I keep multiple, independent iterators pointing in various locations of the range?
Should I implement a cartesian product in a different way? (that's my previous question)
While it is not reflected in the MCVE above, consider a use case where someone tries to implement something similar to std::max_element - trying to return an iterator to the highest-valued pair in the cross product. While looking for the highest value you need to store an iterator to the current best candidate. It cannot alter while you search, and it would be cumbersome to manage the iterators if you need a copy of the range (as suggested in one of the answers).
Materialising the whole cross product is not an option either, as it requires a lot of memory. After all, the whole point of using ranges with filters and other on-the-fly transformations is to avoid such materialisation.
It seems that the resulting view stores state such that it turns out to be single pass. You can work around that by simply making as many copies of the view as you need:
int main() {
std::vector<int> data1{1,5,2,7,6};
std::vector<int> data2{1,5,2,7,6};
auto range =
data1
| ranges::v3::view::filter([](int v) { return v%2; })
| ranges::v3::view::for_each([&data2](int v) {
return data2 | ranges::v3::view::for_each([v](int v2) {
return ranges::v3::yield(std::make_pair(v,v2));
});
});
auto range1= range; // Copy the view adaptor
auto it1 = range1.begin();
for (auto it2 = range.begin(); it2 != range.end(); ++it2) {
std::cout << "[" << it1->first << "," << it1->second << "] [" << it2->first << "," << it2->second << "]\n";
}
std::cout << '\n';
for (; it1 != range1.end(); ++it1) { // Consume the copied view
std::cout << "[" << it1->first << "," << it1->second << "]\n";
}
return 0;
}
Another option would be materializing the view into a container as mentioned in the comments.
Keeping in mind the aforementioned limitation of single-pass views, it is not really hard to implement a max_element
function that returns an iterator, with the important drawback of having to compute the sequence one time and a half.
Here's a possible implementation:
template <typename InputRange,typename BinaryPred = std::greater<>>
auto my_max_element(InputRange &range1,BinaryPred &&pred = {}) -> decltype(range1.begin()) {
auto range2 = range1;
auto it1 = range1.begin();
std::ptrdiff_t pos = 0L;
for (auto it2 = range2.begin(); it2 != range2.end(); ++it2) {
if (pred(*it2,*it1)) {
ranges::advance(it1,pos); // Computing again the sequence as the iterator advances!
pos = 0L;
}
++pos;
}
return it1;
}
What is goin on here?
The entire problem here originates in the fact that std::max_element requires its arguments to be LecacyForwardIterators while the ranges created by ranges::v3::yield apparently (obviously?) only provide LecacyInputIterators. Unfortunately, the range-v3 docs do not explicitly mention the iterator categories one can expect (at least I haven't found it being mentioned). This would indeed be a huge enhancement as all standard library algorithms do explicitly state what iterator categories they require.
In the particular case of std::max_element you are not the first one to stumble over this counterintuitive requirement of ForwardIterator rather than just InputIterator, see Why does std::max_element require a ForwardIterator? for example. In summary, it does make sense, though, because std::max_element does not (despite the name suggesting it) return the max element, but an iterator to the max element. Hence, it is in particular the multipass guarantee that is missing on InputIterator in order to make std::max_element work with it.
For this reason, many other standard library functions do not work with std::max_element either, e.g. std::istreambuf_iterator which really is a pity: you just cannot get the max element from a file with the existing standard library! You either have to load the entire file into memory first, or you have to use your own max algorithm.
The standard library is simply missing an algorithm that really returns the max element rather than an iterator pointing to the max element. Such an algorithm could work with InputIterators as well. Of course, this can very easily be implemented manually, but still it would be handy to have this given by the standard library. I can only speculate why it doesn't exist. Maybe one reason is, that it would require the value_type to be copy constructable because InputIterator is not required to return references to the elements and it might be in turn counterintuitive for a max algorithm to make a copy...
So, now regarding your actual questions:
Why is this? (i.e. why does your range only return InputIterators?)
Obviously, yield creates the values on the fly. This is by design, it's the very reason why one would want to use yield: to not have to create (and thus store) the range upfront. Hence, I do not see how yield could be implemented in a way that it fulfills the multipass guarantee, especially the second bullet is giving me headaches:
If a and b compare equal (a == b is contextually convertible to true) then either they are both non-dereferenceable or *a and *b are references bound to the same object
Technically, I could imagine that one could implement yield in a way that all iterators created from one range share a common internal storage that is filled on the fly during the first traversal. Then it would be possible for different iterators to give you the same references to underlying objects. But then std::max_element would silently consume O(n²) memory (all elements of your cartesian product). So, in my opinion it's definitely better to not do this and instead make the users materialize the range themselves, so that they are aware of it happening.
How can I avoid this?
Well, as already said by metalfox, you can copy your view which would result in different ranges and thus independent iterators. Still, that wouldn't make std::max_element work. So, given the nature of yield the answer to this question, unfortunately, is: you simply cannot avoid this with yield or any other technique that creates values on the fly.
How can I keep multiple, independent iterators pointing in various locations of the range?
This is related to the previous question. Basically, this question answers itself: If you want to point independent iterators in various locations, these locations have to exist somewhere in memory. So, you need to materialize at least those elements that did once have an iterator pointing to them, which in case of std::max_element means that you have to materialize all of them.
Should I implement a cartesian product in a different way?
I can imagine many different implementations. But none of them will be able to provide both of these properties all together:
return ForwardIterators
require less than O(n²) memory
Technically, it could be possible to implement an iterator that is specialized for the usage with std::max_element, meaning that it keeps only the current max element in memory so that it can be referenced... But this would be somewhat ridiculous, wouldn't it? We cannot expect a general purpose library like range-v3 to come up with such highly specialized iterator categories.
Summary
You are saying
After all, I don't think my use case is such a rare outlier and ranges
are planned to be added to the C++20 standard - so there should be
some reasonable way to achieve this without traps...
I definitely agree that "this is not a rare outlier"! However, that doesn't necessarily imply that "there should be some reasonable way to achieve this without traps". Consider e.g. NP-hard problems. It is not a rare outlier to be facing one. Still, it is impossible (unless P=NP) to solve them in polynomial time. And in your case it is simply not possible to use std::max_element without ForwardIterators. And it is not possible to implement a ForwardIterator (as defined by the standard library) on a cartesian product without consuming O(n²) memory.
For the particular case of std::max_element I would suggest to just implement your own version that returns the max element rather than an iterator pointing to it.
However, if I understand your question correctly your concern is more general and std::max_element is just an example. So, I have to disappoint you. Even with the existing standard library some trivial things are impossible due to incompatible iterator categories (again, std::istreambuf_iterator is an existing example). So, if range-v3 happens to be added, there will just be some more of such examples.
So, finally, my recommendation is to just go with your own algorithms, if possible, and swallow the pill of materializing a view otherwise.
An iterator is a pointer to an element in the vector, in this case, it1 points to the beginning of the vector. And hence, if you are trying to point the iterator to the same location of the vector, they will be the same. However, you can have multiple iterators pointing to different locations of the vector. Hope this answers your question.

Error when comparing iterators c++ [duplicate]

It's very basic, but I could not find a similar question here. I am trying to iterate the same sorted STL list from different directions using list. I know I can compare an iterator to the list.begin() and list.end(), so why doesn't this work?
list<family>::iterator itLargeFamily =
families.begin(); //starts from the biggest families
list<family>::iterator itSmallFamily =
families.end(); //starts from the smallest families
for (; itSmallFamily > itLargeFamily; --itSmallFamily, ++itLargeFamily) {
// stuff...
}
The error is of course
no operator > matches these operands
100% chance I'm missing something basic.
Only random access iterators are ordered. std::list iterators are only bidirectional iterators, so they do not support operator< or operator>.
Instead, you could do your comparison with !=.
while (itSmallFamily != itLargeFamily)
You'll have to make sure that the iterators don't jump over each other for this to work though. That is, if itSmallFamily is only one increment away from itLargeFamily, you will simply swap them over and they'll never have been equal to each other.
You could instead use std::vector, whose iterators are random access iterators. In addition, std::array and std::deque are also support random access.
From the comments and the answer of sftrabbit you can see that relational operators are only defined for random access iterators, and std::list has only bidirectional iterators. So there are several solutions for your problem:
Use std::vector or std::array. They provide random access iterators, have better performance for smaller sizes, and depending of how you fill/use them for larger sizes as well, and they have better memory footprint. This is the preferred solution, I'd call it the "default" solution. Use other containers only if there is a very good, measurable reason (e.g. a profiler tells you that using that container is a performance bottleneck).
Since you know the size of the list, you can use a counter for your iterations:
for (size_t i = 0, count = families.size()/2;
i < count;
++i, --itSmallFamily, ++itLargeFamily)
{ /* do stuff */ }
Since your list is sorted, you can compare the elements the iterators point to instead of the iterators themselves.

why can't i use list iterator logical comparisons operator?

It's very basic, but I could not find a similar question here. I am trying to iterate the same sorted STL list from different directions using list. I know I can compare an iterator to the list.begin() and list.end(), so why doesn't this work?
list<family>::iterator itLargeFamily =
families.begin(); //starts from the biggest families
list<family>::iterator itSmallFamily =
families.end(); //starts from the smallest families
for (; itSmallFamily > itLargeFamily; --itSmallFamily, ++itLargeFamily) {
// stuff...
}
The error is of course
no operator > matches these operands
100% chance I'm missing something basic.
Only random access iterators are ordered. std::list iterators are only bidirectional iterators, so they do not support operator< or operator>.
Instead, you could do your comparison with !=.
while (itSmallFamily != itLargeFamily)
You'll have to make sure that the iterators don't jump over each other for this to work though. That is, if itSmallFamily is only one increment away from itLargeFamily, you will simply swap them over and they'll never have been equal to each other.
You could instead use std::vector, whose iterators are random access iterators. In addition, std::array and std::deque are also support random access.
From the comments and the answer of sftrabbit you can see that relational operators are only defined for random access iterators, and std::list has only bidirectional iterators. So there are several solutions for your problem:
Use std::vector or std::array. They provide random access iterators, have better performance for smaller sizes, and depending of how you fill/use them for larger sizes as well, and they have better memory footprint. This is the preferred solution, I'd call it the "default" solution. Use other containers only if there is a very good, measurable reason (e.g. a profiler tells you that using that container is a performance bottleneck).
Since you know the size of the list, you can use a counter for your iterations:
for (size_t i = 0, count = families.size()/2;
i < count;
++i, --itSmallFamily, ++itLargeFamily)
{ /* do stuff */ }
Since your list is sorted, you can compare the elements the iterators point to instead of the iterators themselves.

Why do c++ programmers use != instead of <

In c++ primer, pg 95 the author says that c++ programmers tend to use != in preference of < when writing loops.
for (vector<int>::size_type i = 0; i != 10; ++i) is preferred instead of
for (vector<int>::size_type i = 0; i < 10; ++i)
I read the same thing in accelerated c++. Can someone explain the rationale behind this
When using some kinds of STL iterators (those that aren't random access), you must use !=:
for (map<int,int>::iterator i = a.begin(); i != a.end(); ++i) ...
However, I don't see any reason to prefer != for well-ordered scalar types as in your example. I would usually prefer < for scalar types and != for all iterator types.
It's a habit for generic programming; for example, you can easiely use < with indices, but you cannot use that with all iterator types. A list iterator cannot efficiently implement < - however, != can be implemented for even the simplest of iterator types. Therefore, it is a good habit to always use the most generic comparison - it makes your code more resilient to change.
Think of the case when one have to increment by lets say 3 instead of 1.
for (vector<int>::size_type i = 0; i != 10; i+=3)
This will run forever since it will skip 10 and go to 12 instead and increment forever.
for (vector<int>::size_type i = 0; i < 10; i+=3)
This will work fine in this case too. So != is not always a good choice.
Because, in general, not all iterators support the "<" operation. See the manual for operation supported by each iterator type. Only random access iterators (of which, simple pointers are a subset) support inequality comparisons (< and >) between iterators
If you write !=, then you can reverse the loop iteration with minimal change.
Suppose you first write:
for ( int i = m; i != n ; i++ )
Later you reverse it:
for ( int i = n ; i != m ; i-- )
Not so appealing, but still it requires less analysis than "<" and ">".
The requirement of being "relationally comparable" is a much stronger one than the requirement of being "equally comparable". When it comes to iterating over containers, the possibility to perform relational comparison between iterators or generic indices (like <, >, <= etc.) is strongly associated with random-access containers, while the equality comparisons are more universally applicable (and often the only ones available when working with sequential access containers).
In general, it is a good practice to make you code as generic as possible, i.e. you should never rely on stronger requirements when weaker requirements are perfectly sufficient. In other words, if you can implement your algorithm by using equality comparisons only, it is better to do it that way, without bringing in any relational comparisons. It is possible that way you will make your algorithm more usable with a wider range of underlying data structures (containers).
Of course if you don't care about this kind of genericity or simply don't need it, you can just ignore these considerations and use either approach.
Maybe those who preffer this is because they got used to checking for null, etc... so preffer use the same != in everything.
if(x != null) { ... }
for(int i=0; i != 10; i++) { ... }
so for those everyting tends to be != or ==
read != as DIFFERENT/NOT EQUAL,
following the same principle as == is EQUAL,

Does an STL map always give the same ordering when iterating from begin() to end()?

It appears to from my simple testing but I'm wondering if this is guaranteed?
Are there conditions where the ordering will be not be guaranteed?
Edit: The case I'm particularly interested in is if I populate a map with a large number of entries, will the order of the itertator be the same across multiple runs of my executable? What if the entries are inserted in a different order?
Yes, it maintains an internal order, so iteration over a set that isn't changing should always be the same. From here:
Internally, the elements in the map
are sorted from lower to higher key
value following a specific strict weak
ordering criterion set on
construction.
std::map is a sorted container, so, yes, order is guaranteed (the same as the ordering you use implicitly or explicitly in its constructor). Do not count on this for the popular (though not-yet-stanard) hashmap though -- it has very many advantages in many cases wrt std::map, but not a predictable order of iteration!
std::map is a sorted collection
and you would have to define the less than operator
imagine m is a map of type T:
assert(m.size() > 1);
for (std::map<T>::const_iterator i = m.begin(); i != m.end(); ++i) {
std::map<T>::const_iterator j = i + 1;
while ( j != m.end() ) {
assert(*i < *j);
++j;
}
}
Will an STL map give the same ordering with begin/end if it is unchanged? Yes. If you change the map though, do not depend on the ordering remaining the same.
On the same data set under the same implementation of STL, yes. It's not guaranteed to be the same across different implementations as far as I'm aware.