Can iterators from different containers be (re)assigned? - c++

Iterators from different containers cannot be compared (see for example here:https://stackoverflow.com/a/4664519/225186) (or more technically, it doesn't need to make sense.)
This raises another question, can iterator from different ranges be assigned to each other?
Container A = ...;
Container B = ...;
auto it1 = A.begin();
it1 = B.begin(); // it1 first belonged to A, does it1 need to belong to B later
Is the last line required to work by the Iterator concept in some standard or in the accepted practices or in the upcoming std ranges?
Since equality and assigment are so intertwined, it seem that if equality (==) is not always well defined then assignment (=) doesn't need to be well defined either, and probably for similar underlying reasons.
The reason I ask is not purely academic, because a certain iterator implementation could have some "metadata" from the container, and that (depending on the implementation) may or may not be able to be reassigned or simply a waste to be reassigned.
(For example a stride information that is unique to A and doesn't agree with that of B.
Another example is when the iterator stores a reference to the original range.)
This could allow that when assignment is tried, a particular field (member) might be left untouched.
One could make the assignment work in some cases, and that may produce less surprises, but it could also limit the implementations, and the question is would it be really necessary to define/allow the assignment between iterators of different origin (provenance)?
UPDATE 2021:
The linked documents read things like:
[...] the term the domain of == is used in the ordinary mathematical
sense to denote the set of values over which == is (required to be)
defined. This set can change over time. Each algorithm places
additional requirements on the domain of == for the iterator values it
uses. These requirements can be inferred from the uses that algorithm
makes of == and !=.
So there is an implicit defined (by the algorithms) range of validity of ==
.
Another way to formulate this question is if the same caveats for the domain of applicability of == can be applied, by simple use of logic, to =.
The underlying idea is that defining == in isolation of = or vise versa doesn't make sense.
(and also because I found a motivating case).

If you check out cppreference.com for the container you are interested you can find out the requirements for its iterators.
If we look at std::vector for example, its iterators are specified to be LegacyRandomAccessIterator. If you follow the hierarchy of definitions up from there to the base LegacyIterator you'll see that iterators are required to be CopyAssignable which means you must be able to assign one iterator to another of the same type.
All of the standard library containers use iterators derived from LegacyIterator, containers from other libraries or containers are free to ignore these requirements but it would be quite surprising to users if iterators weren't CopyAssignable and even more surprising if iterators were only not CopyAssignable between containers of the same type as that would potentially only be a runtime failure and not a compile time failure.

Related

C++ Range-adaptors/views and iterator invalidation rules

I have not found any direct reference to range/range-adaptor/range-view specific invalidation rules when modifying the underlying container.
Intuition suggests it would be exactly the same as pointer/iterator invalidation rules -- which are specified within the containers section of the standard.
The current container invalidation wording is as follows:
"...invalidates all the references, pointers, and iterators referring
to the elements in the sequence, as well as the past-the-end
iterator."
Which raises the question: Do all ranges necessarily "refer to the elements of the sequence", or, could they be accessing elements through the interface of the container?
It seems to me that most range adaptors already access a sequence without referring directly to the elements of that sequence (i.e. lazy views just build up iterator adaptors).
What seems to matter is the underlying range at the base of the view pyramid, so to speak.
We all learn at some point, that you cannot do std::vector::push_back while iterating that same vector, because the memory may move and invalidate the iteration. But, we also learn, that you can use std::vector::operator[] access with push_back, so long as you are careful with checking your size() bounds correctly.
It seems to me the same rules would apply to ranges/adaptors/views.
So: is it possible to force some equivalent to std::ranges::views​::​all (or, perhaps take_view) over a random access container to use array indexing (or some equivalent indirect/lazy element access), and to not use iteration directly?
Something to allow this:
std::vector<People> people = ...;
for (auto& person : std::ranges::views::lazy_all(people)) { // or ranges::lazy_take_view(people, people.size())
if (person.has_new_child()) {
people.push_back(person.get_new_child());
}
}
I'm currently playing with C++20 ranges and while implementing my own view I came up with the same question: what are the rules for iterator invalidation for views?
As far as I can see, ranges heavily use meta-programming and under the hood they construct a hierarchy of state machines. Actual types of these state machines are often hidden [1], so we may have difficulties to assume their restrictions. Iterator invalidation is a part of these restrictions, so to specify when and how iterators are invalidated when we construct a view hierarchy can be quite challenging if not impossible. Even if we manage to describe these rules they could be impossible to memorise, not to mention to use efficiently.
Ranges V3 library has the following recommendation:
View validity
Any operation on the underlying range that invalidates its iterators or sentinels will also invalidate any view that refers to any part of that range. Additionally, some views (e.g., views::filter), are invalidated when the underlying elements of the range are mutated. It is best to recreate a view after any operation that may have mutated the underlying range.
https://github.com/ericniebler/range-v3/blob/master/doc/index.md
The restriction given above just slashes away all concerns and although it is stricter than the rules for the standard containers it establishes a simple rule to memorise about any view iterators invalidation. At the same time it gives freedom to change a view implementation without touching that rule.
So, I suppose, it is safe to assume that ranges in C++20 standard are subject to the same limitations.
[1] My observation are based on MSVC implementation for ranges where range adapters can actually spawn different types based on strategies. So when you pipeline std::views::take(), for example, you may suddenly end up with std::span().
Each range adaptor or view has its own rules for how it interacts with the underlying range. And the standard spells out these interactions, if sometimes obliquely.
ranges::ref_view for example is explicitly stated to work as if it holds a pointer to the range. And its begin/end functions behave as if they call that range's begin/end functions, as well as forwarding any other functionality to the range it was given. So its interactions are pretty clear, since its iterators are the exact same type as those of the underlying range.
For a range like ranges::filter_view, it's a bit more difficult to track down. filter_view's iterator behavior is based on the behavior of filter_iterator. That type behaves as if it stores an iterator to the underlying range (due to the exposition-only iterator member). So a filter_iterator will be invalidated whenever the underlying range's iterators are. And there is no exposition member of filter_view that holds an iterator, so you might expect that calling begin will always get a fresh, valid filter_iterator.
But it won't. There is a major caveat buried in the description of filter_view::begin. A semantic component of the behavior of a range's begin function is that it must execute in amortized constant time. Finding the start of a filtered list is a linear-time operation. Therefore, filter_view::begin is required to only do this linear-time operation once and then internally cache the result.
Which means it does store an iterator, even if it's not clear that it does. So if you invalidate whatever the begin filter_iterator is using, you have invalidated the filter_range's begin iterator and must construct a new one.
In summary, if you want to know the iterator invalidation behavior for a view, you must read through the entire description of that view. But this is true of containers as well; there isn't a nice, neat section of the standard which spells out exactly when iterators, references, and pointers are invalidated. Cppreference has a nice list, but the standard leaves it up to the definition of each function in the class.

Can I use std::upper_bound without an underlying container?

I have a range of integers [start, end] and a non-decreasing monotonic function f(i).
So conceptually, I have a non-decreasing sequence [f(start), f(start + 1), .. , f(end)].
Can I use std::upper_bound on that sequence to find the first element i in the range that holds f(i) > some_value ?
Conceptually, I'd like something like this:
std::upper_bound(start, end + 1, some_value, [&](int lhs, int rhs) {
return f(lhs) < f(rhs);
});
But this doesn't compile because start and end + 1 do not meet the requirements of forward iterators.
The short answer is yes, since std::upper_bound works on iterators, not on containers. But iterators themselves are instances of corresponding class (for example, std::vector<int>::iterator or whatnot).
If you construct some specific class that will meet the requirements of ForwardIterator not being actually bound to some sort of container, while still meaning something (for example, if you want to generate your sequence procedurally), it should work just fine.
Note that simple integer will not do the trick. On the other hand, a class, whose objects hold the value of your function for a particular argument value (with some additional batteries), will.
There are basically two answers:
Would it work by the standard or would it work with all practical implementations of the STL?
By the standard, as T.C. pointed out already, there are some strict requirements on iterators, especially that *it has to return a (possibly const) reference to value_type (which we would satisfy by returning the reference to a member of the iterator), but we also need that for it1 == it2, *it1 and *it2 are references bound to the same object, which is only possible if we have a distinct object for every number in the range.
If you want to do use this idea in practice, I don't believe any implementation of std::upper_bound or similar methods actually relies on this reference equality, so you could just use a class that encapsulates an integer as an iterator, only overloading the necessary methods. As far as I can see, boost::irange fulfills these requirements
As you can see, this is not strictly standard-compliant, but I see no reason why any implementation of binary search should rely on such strong requirements for the iterator, if the underlying 'storage' is const anyway.
No, not practically, but yes in practice, but no if you want to be practical.
No
upper_bound requires ForwardIterator. ForwardIterator requires that * returns an actual reference, and that if two iterators are equal then they refer to the same object.
Not practically
For a container-less iterator, this requires an insanely complex iterator that caches the values it returns in a shared global map of some kind. To make it half practical, note that the iterator requirements say very little about the lifetime of said reference; so you'd want to reference count and destroy said values as the iterators in question cease to exist.
Such a solution requires synchronization, global state, and is significantly more expensive and complex than something like boost::integer_range. No sane person would write this except as an exercise demonstrating why the standard needs to be fixed.
But yes in practice
No sane implementation of upper_bound actually requires that the iterators in question are full-scale forward iterators, barring one that does full concept-checks to validate against the standard (and not against what the actual algorithm needs). Input iterators with stability on the values returned almost certainly does it. There is no such concept in the C++ standard, and forward iterator is the weakest iterator category in the standard that satifies it.
This problem, of effectively demanding iterators be backed by containers, is a flaw in the standard in my opinion. Container-free iterators are powerful and useful, except they rarely technically work in standard containers.
Adding new iterator categories has proved problematic, because there is little way to do it without breaking existing code. They looked into it for contiguous iterators, and wrote it off as impractical (I don't know all the details of what they tried).
Adding new iterator concepts that are not backed by tags is more possible, but probably will have to wait until concepts are part of the C++ language and not just the standard; then experimenting with adding new concepts becomes something you can specify in C++ instead of in standardese, which makes it far easier.
But no if you want to be practical
This does, however, result in an ill-formed program, no diagnostic required. So consider if it is worth it; it may actually be easier to reimplement upper_bound than maintain a program whose every excution is undefined behavior, and every compile at the mercy of a compiler upgrade.

Does the C++11 standard require that two iterations through a constant unordered_container visit elements in the same order?

for (auto&& i : unordered_container)
{ /* ... */ }
for (auto&& i : unordered_container)
{ /* .. */ }
Does the standard require that both of these loops visit elements in the same order (assuming the container is unmodified)?
My analysis of this question...
I read the standard and as best I can tell the answer is "no"...
Since iterators of containers are forward, there is language that requires a==b imply that ++a==++b for forward iterators. That means two iterations will go through the same path IF they both start in the same place. This reduces the question to a different question of whether the standard requires container.begin() == container.begin(). I couldn't find any language that requires this.
Containers are required to implement operator==(). That is we can do:
container c;
c == c;
That relation is required to work the same as:
std::distance(a.begin(), a.end()) == std::distance(b.begin(), b.end()) &&
std::equal(a.begin(), a.end(), b.begin());
The important part here is the call to std::equal(). This call requires that two independent calls to container.begin() will produce the same sequence of values. If it didn't, then c == c would be false, and that doesn't make any sense because == is an equivalence relation.
Therefore, my answer is that we can claim that the standard requires that two passes of any container must result in the same ordering. Obviously this requirement breaks if you do anything that changes the container or invalidates iterators.
Citations:
C++ 2011 Table 96 — Container requirements
I think #Sharth's conclusion is correct, but (for anybody who cares about newer standards) is already obsolete (and may not have ever reflected reality--see below).
More recent drafts of the standard (e.g., n3797) have changed the requirements, apparently intentionally removing the ordering requirement. Specifically, it says (§23.2.5/12):
Two unordered containers a and b compare equal if a.size() == b.size() and, for every equivalent-key group [Ea1,Ea2) obtained from a.equal_range(Ea1), there exists an equivalent-key group [Eb1,Eb2) obtained from b.equal_range(Ea1), such that distance(Ea1, Ea2) == distance(Eb1, Eb2) and is_permutation(Ea1, Ea2, Eb1) returns true.
I also have relatively low confidence that implementations actually meet the requirements of the 2011 standard either. In particular, the unordered containers are normally implemented as hash tables with linked lists for collision resolution. Since those linked lists are expected to be short, they're not necessarily sorted (particularly since items stored in unordered containers aren't required to define operations to be used for sorting, such as operator<). This being the case, it's fairly routine for the linked lists to hold the same items, but in an order that depends upon the order in which they were inserted.
In such a case, it would be fairly routine for two hash tables that contained the same items that had been inserted in different orders to iterate over those items in different orders.
In theory such an implementation doesn't conform with the C++11 standard--but I'd guess the change cited above was made largely because that requirement couldn't be met in practice (because, as noted above, the container had no way to enforce ordering).
So, as long as you're dealing with the same container, unchanged, depending on iteration in the same order may be safe. Two containers that have the same contents may not work out so well though (and even in what claims to be a C++11 implementation, you probably can't expect it to meet tighter requirements than the newer drafts contain).
My read of the standard is that this is not guaranteed. 23.2.5, paragraph 6, states:
Thus, although the absolute order of elements in an unordered
container is not specified, its elements are grouped into
equivalent-key groups such that all elements of each group have
equivalent keys. Mutating operations on unordered containers shall
preserve the relative order of elements within each equivalent-key
group unless otherwise specified.
Let's take off the table the fairly clear guarantee that elements that hash to the same key will have their relative order preserved no matter what. That seems to be fairly clear. Additionally, lets exclude any modifications to the container. In the remaining scope:
Although this doesn't actually explicitly define that the iteration order, in absence of changes to the container, is unstable, I interpret the statement "the absolute order of elements in an unordered container is not specified" on its literal face value. If the iteration order is undefined, then it is undefined, and is not guaranteed to be the same every time.
I think it all comes down to whether, in the quoted excerpt "is not specified" should be interpreted as "it could be anything" or "it could be anything, at any given time".
I think an argument can be made either way. I would interpret "is not specified" in the most strict, literal interpretation of the latter, but I wouldn't object too hard if someone would argue in favor of the former.
Unordered containers do return forward iterators (which are defined in § 24.2.5) and those do have this property: a == b implies ++a == ++b. This seems to imply so long that unordered_container.begin() == unordered_container.begin() is true, that the traversal order will be the same.
I was unable to find any language that requires unordered_container.begin() == unordered_container.begin() which led me to a tentative answer of "no", the traversal order isn't required to be the same.

How to handle multiple iterator types

I have a custom datastructure, which can be accessed in multiple ways. I want to try to keep this datastructure to keep to STL-standards as well as possible. So I already have lot's of typedefs, which give template parameters the STL-names. This is business as usual for me by now.
However I am not so sure how to correctly add iterators to my datastructure. The main problem I am facing is, that there would be multiple iteration policies over the datastructure. The easiest use case is iterating over all elements, which would be handled well by STL-Conforming iterators over the datastructure. However one might also want to access elements, which are somehow similar to a given key. I would also like to iterate over all these similar elements in a way which I can interface with the STL.
These are the ideas I have thought about so far:
Provide only one type of iterator:
This is basicly what std::map does. The start and end iterators for a subrange are provided by std::map::lower_bound() and std::map::upper_bound().
However this works well, because the iterators returned by begin(), end(), lower_bound() and upper_bound() are compatible, i.e. the operator==() can be given a very well defined meaning on these. In my case this would be hard to get right, or might even be impossible to give some clear semantics. For example I probably would get some cases where it1==it2 but ++it1!=++it2. I am not sure if this is allowed by the STL.
Provide multiple types of iterators:
Much easier to provide clean operator==() semantics. Nasty on the other hand because it enlarges the number of types.
Provide one type of iterator and use stl::algorithms for specialized access
I am not sure if this is possible at all. The iteration state should be kept by the iterator somehow (either directly or in a Memento). Using this approach would mean to specialize all stl::algorithms and access the predicate directly in the specialization. Most likely impossible, and if possible a very bad idea.
Right now I am mostly opting for Version 1, i.e. to provide only one type of iterator at all. However since I am not clear on how to clean up the semantics, I have not yet decided.
How would you handle this?
Standard containers support two iteration policies with two iterator types: ::iterator and ::reverse_iterator. You can convert between the two using the constructor of std::reverse_iterator, and its member function base().
Depending how similar your iteration policies are, it may or may not be easy to provide conversions to different iterator types. The idea is that the result should point at the "equivalent position" in the iteration policy of the destination type. For reverse iterators, this equivalence is defined by saying that if you insert at that point, the result is the same. So if rit is a reverse iterator, vec.insert(rit.base(), ...) inserts an element "before" rit in the reverse iteration, that is to say after the element pointed to by rit in the container. This is quite fiddly, and will only get worse when the iteration policies are completely unrelated. But if all of your iterator types are (or can be made to look like) wrappers around the "normal" iterator that goes over all elements, then you can define conversions in terms of that underlying iterator position.
You only actually need conversions if there are member functions that add or remove elements of the container, because you probably don't want to have to provide a separate overload for each iterator type (just like standard containers don't define insert and erase for reverse iterators). If iterators are used solely to point at elements, then most likely you can do without them.
If the different iteration policies are all iterating in the normal order over a subset of the elements, then look at boost::filter_iterator.
I probably would get some cases where it1==it2 but ++it1!=++it2. I am
not sure if this is allowed by the STL.
If I understand correctly, you got it1 by starting at thing.normal_begin(), and you got it2 by starting at thing.other_policy_begin(). The standard containers don't define the result of comparing iterators of the same type that belong to different ranges, so if you did use a common type, then I think this would be fine provided the documentation makes it clear that although operator== does happen to work, the ranges are separate according to where the iterator came from.
For example, you could have a skip_iterator which takes as a constructor parameter the number of steps it should move forward each time ++ is called. Then you could either include that integer in the comparison, so that thing.skip_begin(1) != thing.skip_begin(2), or you could exclude it so that thing.skip_begin(1) == thing.skip_begin(2) but ++(++(thing.skip_begin(1))) == ++(thing.skip_begin(2)). I think either is fine provided it's documented, or you could document that comparing them is UB unless they came from the same starting point.
Why are more types a problem? It does not necessarily mean much more code. For instance, you could make you iterator-type a template that takes an iteration-policy as template-parameter. The iteration-policy could then provide the implementation of the iteration:
struct iterate_all_policy {
iterate_all_policy(iterator<iterate_all_policy> & it) : it(it) {}
void advance() { /* implement specific advance here */ }
private:
iterator<iterate_all_policy> & it;
}
You will probably have to make the iteration-policy-classes friends of the iterator-types.

Check whether iterator belongs to a list

Is there any way to check whether a given iterator belongs to a given list in C++?
The obvious but invalid approach
You can't simply iterate through the list, comparing each iterator value to your "candidate".
The C++03 Standard is vague about the validity of == applied to iterators from different containers (Mankarse's comment on Nawaz's answer links http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2948.html#446), some compilers (eg. VC++2005 debug mode) warn if you do so, but despite all that it may actually work reliably depending on your compiler/libraries - check its documentation if you don't care about portability.
The C++11 Standard is very explicit, you can't compare iterators to different containers:
§ 24.2.5 The domain of == for forward iterators is that of iterators over the same underlying sequence.
So, the answers to this question that rely on operator== are questionable now, and invalid in future.
An oft-valid approach
What you can do is iterate along the list, comparing the address of the elements (i.e. &*i) to the address of the object to which your other iterate points.
Mankarse's comment cautions that this might not work as intended for objects providing their own operator&. You could work around this using std::addressof, or for C++03 boost's version
Martin's comment mentions that you have to assume the candidate iterator that you're testing list membership for is safely dereferenceable - i.e. not equal to an end() iterator on the container from which it came. As Steve points out - that's a pretty reasonable precondition and shouldn't surprise anyone.
(This is fine for all Standard containers as stored elements never have the same address, but more generally user-defined containers could allow non-equal iterators to address the same value object (e.g. supporting cycles or a "flyweight pattern" style optimisation), in which case this approach would fail. Still, if you write such a container you're probably in a position to design for safe iterator comparison.)
Implementation:
template <class IteratorA, class IteratorB, class IteratorC>
inline bool range_contains(IteratorA from, const IteratorB& end,
const IteratorC& candidate)
{
while (from != end)
if (&*from++ == &*candidate)
return true;
return false;
}
Notes:
This adopts the Standard library approach of accepting a range of iterator positions to search.
The types of each iterator are allowed to vary, as there are portability issues, e.g. containers where begin() returns an iterator but end() returns a const_iterator.
Iterators other than from are taken by const reference, as iterators can sometimes be non-trivial objects (i.e. too large to fit in a register, relatively expensive to copy). from is needed by value as it will be incremented through the range.