C++ Range-adaptors/views and iterator invalidation rules

C++ Range-adaptors/views and iterator invalidation rules - c++

I have not found any direct reference to range/range-adaptor/range-view specific invalidation rules when modifying the underlying container.
Intuition suggests it would be exactly the same as pointer/iterator invalidation rules -- which are specified within the containers section of the standard.
The current container invalidation wording is as follows:
"...invalidates all the references, pointers, and iterators referring
to the elements in the sequence, as well as the past-the-end
iterator."
Which raises the question: Do all ranges necessarily "refer to the elements of the sequence", or, could they be accessing elements through the interface of the container?
It seems to me that most range adaptors already access a sequence without referring directly to the elements of that sequence (i.e. lazy views just build up iterator adaptors).
What seems to matter is the underlying range at the base of the view pyramid, so to speak.
We all learn at some point, that you cannot do std::vector::push_back while iterating that same vector, because the memory may move and invalidate the iteration. But, we also learn, that you can use std::vector::operator[] access with push_back, so long as you are careful with checking your size() bounds correctly.
It seems to me the same rules would apply to ranges/adaptors/views.
So: is it possible to force some equivalent to std::ranges::views::all (or, perhaps take_view) over a random access container to use array indexing (or some equivalent indirect/lazy element access), and to not use iteration directly?
Something to allow this:
std::vector<People> people = ...;
for (auto& person : std::ranges::views::lazy_all(people)) { // or ranges::lazy_take_view(people, people.size())
if (person.has_new_child()) {
people.push_back(person.get_new_child());
}
}

I'm currently playing with C++20 ranges and while implementing my own view I came up with the same question: what are the rules for iterator invalidation for views?
As far as I can see, ranges heavily use meta-programming and under the hood they construct a hierarchy of state machines. Actual types of these state machines are often hidden [1], so we may have difficulties to assume their restrictions. Iterator invalidation is a part of these restrictions, so to specify when and how iterators are invalidated when we construct a view hierarchy can be quite challenging if not impossible. Even if we manage to describe these rules they could be impossible to memorise, not to mention to use efficiently.
Ranges V3 library has the following recommendation:
View validity
Any operation on the underlying range that invalidates its iterators or sentinels will also invalidate any view that refers to any part of that range. Additionally, some views (e.g., views::filter), are invalidated when the underlying elements of the range are mutated. It is best to recreate a view after any operation that may have mutated the underlying range.
https://github.com/ericniebler/range-v3/blob/master/doc/index.md
The restriction given above just slashes away all concerns and although it is stricter than the rules for the standard containers it establishes a simple rule to memorise about any view iterators invalidation. At the same time it gives freedom to change a view implementation without touching that rule.
So, I suppose, it is safe to assume that ranges in C++20 standard are subject to the same limitations.
[1] My observation are based on MSVC implementation for ranges where range adapters can actually spawn different types based on strategies. So when you pipeline std::views::take(), for example, you may suddenly end up with std::span().

Each range adaptor or view has its own rules for how it interacts with the underlying range. And the standard spells out these interactions, if sometimes obliquely.
ranges::ref_view for example is explicitly stated to work as if it holds a pointer to the range. And its begin/end functions behave as if they call that range's begin/end functions, as well as forwarding any other functionality to the range it was given. So its interactions are pretty clear, since its iterators are the exact same type as those of the underlying range.
For a range like ranges::filter_view, it's a bit more difficult to track down. filter_view's iterator behavior is based on the behavior of filter_iterator. That type behaves as if it stores an iterator to the underlying range (due to the exposition-only iterator member). So a filter_iterator will be invalidated whenever the underlying range's iterators are. And there is no exposition member of filter_view that holds an iterator, so you might expect that calling begin will always get a fresh, valid filter_iterator.
But it won't. There is a major caveat buried in the description of filter_view::begin. A semantic component of the behavior of a range's begin function is that it must execute in amortized constant time. Finding the start of a filtered list is a linear-time operation. Therefore, filter_view::begin is required to only do this linear-time operation once and then internally cache the result.
Which means it does store an iterator, even if it's not clear that it does. So if you invalidate whatever the begin filter_iterator is using, you have invalidated the filter_range's begin iterator and must construct a new one.
In summary, if you want to know the iterator invalidation behavior for a view, you must read through the entire description of that view. But this is true of containers as well; there isn't a nice, neat section of the standard which spells out exactly when iterators, references, and pointers are invalidated. Cppreference has a nice list, but the standard leaves it up to the definition of each function in the class.

Related

Why is there a "erase–remove idiom" for std::remove, std::remove_if with containers?

I was just looking into why the function std::remove_if wasn't working the way I expected, and learned about the C++ "erase-remove idiom" where I'm supposed to pass the result of remove or remove_if to erase to actually remove the items I want from a container.
This strikes me as quite unintuitive: this means remove and remove_if don't do what they say on the tin. It also makes for more verbose, less clear code.
Is there a justification for this? I figure there's some kind of trade-off with an upside balancing out the downsides I listed in the previous paragraph.
My first thought would be that there's some use-case for using remove or remove_if on their own, but since they leave the remaining items in a collection in an undefined state, I can't think of any possible use case for that.

This is a necessary function of the way the container/iterator/algorithm paradigm works. The basic concept of the model is as follows:
Containers contain and manage sequences of values.
Algorithms act on sequences of values.
Iterators represent moveable positions within a sequence of values.
Therefore, algorithms act on iterators, which represent locations within some sequence of values, usually provided by a container.
The problem is that removal of an item from a container doesn't fit that paradigm. The removal of an element from a container is not an act on a "sequence of values"; it fundamentally changes the nature of the sequence itself.
That is, "removal" ultimately finishes with a container operation, not an iterator operation. If algorithms only act on iterators, no pure algorithm can truly remove elements. Iterators don't know how to do that. Algorithms that only act on iterators can move values around within a sequence, but they cannot change the nature of the sequence such that the "removed" values no longer exist.
But while the removal of elements is a container operation... it's not a value agnostic operation. remove removes only values that compare equal to the given value. remove_if removes only values for which the predicate returns true. These are not container operations; they are algorithms that don't really care about the nature of the container.
Except for when it comes time to actually remove them from the container. From the perspective of the above paradigm, it is inherently two separate operations: an algorithm followed by a container operation.
That all being said, C++20 does give a number of containers non-member std::erase and std::erase_if specializations. These do the full job of erase-remove as a non-member function.
My first thought would be that there's some use-case for using remove or remove_if on their own, but since they leave the remaining items in a collection in an undefined state, I can't think of any possible use case for that.
There are uses for it. Multiple removal being the obvious one. You can perform a series of remove actions, so long as you pass the new end iterator to each subsequent removal (so that no operation examines removed elements). You can do a proper container erase at the end.
It should also be noted that the C++20 std::erase and std::erase_if functions only take containers, not sub-sections of containers. That is, they don't allow you to erase from some range within a container. Only the erase/remove idiom allows for that.
Also, not all containers can erase elements. std::array has a fixed size; truly erasing elements isn't allowed. But you can still std::remove with them, so long as you keep track of the new end iterator.

Many algorithms from the standard library operate on general iterators, which cannot be used to remove elements. erase is a method of the container and has access to more information, so it can be used to directly delete elements.

STL iterator revalidation for end (past-the-end) iterator?

See related questions on past-the-end iterator invalidation:
this, this.
This is more a question of design, namely, is there (in STL or elsewhere) such concept as past-the-end iterator "revalidation"?
What I mean by this, and use case: suppose an algorithm needs to "tail" a container (such as a queue). It traverses the container until end() is reached, then pauses; independently from this, another part of the program enqueues more items in the queue. How is it possible for the algorithm to (EDIT) efficiently tell, "have more items been enqueued" while holding the previously past-the-end iterator (call it tailIt)? (this would imply it is able to check if tailIt == container.end() still, and if that is false, conclude tailIt is now valid and points to the first element that was inserted).
Please don't dismiss the question as "no, there isn't" - I'm looking to form judgment around how to design some logic in an idiomatic way, and have many options (in fact the iterators in question are to a hand-built data structure for which I can provide this property - end() revalidation - but I would like to judge if it is a good idea).
EDIT: made it clear we have the iterator tailIt and a reference to container. A trivial workaround for what I'm trying to do is, also remember count := how many items you processed, and then check is container.size() == count still, and if not, seek to container[count] and continue processing from there. This comes with many disadvantages (extra state, assumption container doesn't pop from the front (!), random-access for efficient seeking).

Not in general. Here are some issues with your idea:
Some past-the-end iterators don't "point" to the data block at all; in fact this will be true of any iterator except a vector iterator. So, overall, an extant end-iterator just is never going to become a valid iterator to data;
Iterators often become invalidated when the container changes — while this isn't always true, it also precludes a general solution that relies on dereferencing some iterator from before the mutation;
Iterator validity is non-observable — you already need to know, before you dereference an iterator, whether or not it is valid. This is information that comes from elsewhere, usually your brain… by that I mean the developer must read the code and make a determination based on its structure and flow.
Put all these together and it is clear that the end iterator simply cannot be used this way as the iterator interface is currently designed. Iterators refer to data in a range, not to a container; it stands to reason, then, that they hold no information about a container, and if the container causes the range to change there's no entity that the iterator knows about that it can ask to find this out.
Is the described logic possible to create? Certainly! But with a different iterator interface (and support from the container). You could wrap the container in your own class type to do this. However, I advise against making things that look like standard iterators but behave differently; this will be very confusing.
Instead, encapsulate the container and provide your own wrapper function that can directly perform whatever post-enqueuement action you feel you need. You shouldn't need to watch the state of the end iterator to achieve your goal.

In the case for a std::queue, no there isn't (heh). Not because the iterators for a queue get invalidated once something is pushed, but because a queue doesn't have any iterators at all.
As for other iterator types, most (or any of them) of them don't require a reference to the container holder (the managing object containing all the info about the underlying data). Which is an trade-off for efficiency over flexibility. (I quickly checked the implementation of gcc's std::vector::iterator)It is possible to write an implementation for an iterator type that keeps a reference to the holder during its lifetime, that way the iterators never have to be invalidated! (unless the holder is std::move'd)
Now to throw in my professional opinion, I wouldn't mind seeing a safe_iterator/flex_iterator for cases where the iterator normally would be invalidated during iterations.
Possible user interface:
for (auto v : make_flex_iterator(my_vector)) {
if (some_outside_condition()) {
// Normally the vector would be invalidated at this point
// (only if resized, but you should always assume a resize)
my_vector.push_back("hello world!");
}
}
Literally revalidating iterators might be too complex to build for it's use case (I wouldn't know where to begin), but designing an iterator which simply never invalidates is quite trivial, with only as much overhead as a for (size_t i = 0; i < c.size(); i++); loop.But with that said, I cannot assure you how well the compiler will optimize, like unrolling loops, with these iterators. I do assume it will still do quite a good job.

Can I use std::upper_bound without an underlying container?

I have a range of integers [start, end] and a non-decreasing monotonic function f(i).
So conceptually, I have a non-decreasing sequence [f(start), f(start + 1), .. , f(end)].
Can I use std::upper_bound on that sequence to find the first element i in the range that holds f(i) > some_value ?
Conceptually, I'd like something like this:
std::upper_bound(start, end + 1, some_value, [&](int lhs, int rhs) {
return f(lhs) < f(rhs);
});
But this doesn't compile because start and end + 1 do not meet the requirements of forward iterators.

The short answer is yes, since std::upper_bound works on iterators, not on containers. But iterators themselves are instances of corresponding class (for example, std::vector<int>::iterator or whatnot).
If you construct some specific class that will meet the requirements of ForwardIterator not being actually bound to some sort of container, while still meaning something (for example, if you want to generate your sequence procedurally), it should work just fine.
Note that simple integer will not do the trick. On the other hand, a class, whose objects hold the value of your function for a particular argument value (with some additional batteries), will.

There are basically two answers:
Would it work by the standard or would it work with all practical implementations of the STL?
By the standard, as T.C. pointed out already, there are some strict requirements on iterators, especially that *it has to return a (possibly const) reference to value_type (which we would satisfy by returning the reference to a member of the iterator), but we also need that for it1 == it2, *it1 and *it2 are references bound to the same object, which is only possible if we have a distinct object for every number in the range.
If you want to do use this idea in practice, I don't believe any implementation of std::upper_bound or similar methods actually relies on this reference equality, so you could just use a class that encapsulates an integer as an iterator, only overloading the necessary methods. As far as I can see, boost::irange fulfills these requirements
As you can see, this is not strictly standard-compliant, but I see no reason why any implementation of binary search should rely on such strong requirements for the iterator, if the underlying 'storage' is const anyway.

No, not practically, but yes in practice, but no if you want to be practical.
No
upper_bound requires ForwardIterator. ForwardIterator requires that * returns an actual reference, and that if two iterators are equal then they refer to the same object.
Not practically
For a container-less iterator, this requires an insanely complex iterator that caches the values it returns in a shared global map of some kind. To make it half practical, note that the iterator requirements say very little about the lifetime of said reference; so you'd want to reference count and destroy said values as the iterators in question cease to exist.
Such a solution requires synchronization, global state, and is significantly more expensive and complex than something like boost::integer_range. No sane person would write this except as an exercise demonstrating why the standard needs to be fixed.
But yes in practice
No sane implementation of upper_bound actually requires that the iterators in question are full-scale forward iterators, barring one that does full concept-checks to validate against the standard (and not against what the actual algorithm needs). Input iterators with stability on the values returned almost certainly does it. There is no such concept in the C++ standard, and forward iterator is the weakest iterator category in the standard that satifies it.
This problem, of effectively demanding iterators be backed by containers, is a flaw in the standard in my opinion. Container-free iterators are powerful and useful, except they rarely technically work in standard containers.
Adding new iterator categories has proved problematic, because there is little way to do it without breaking existing code. They looked into it for contiguous iterators, and wrote it off as impractical (I don't know all the details of what they tried).
Adding new iterator concepts that are not backed by tags is more possible, but probably will have to wait until concepts are part of the C++ language and not just the standard; then experimenting with adding new concepts becomes something you can specify in C++ instead of in standardese, which makes it far easier.
But no if you want to be practical
This does, however, result in an ill-formed program, no diagnostic required. So consider if it is worth it; it may actually be easier to reimplement upper_bound than maintain a program whose every excution is undefined behavior, and every compile at the mercy of a compiler upgrade.

How to handle multiple iterator types

I have a custom datastructure, which can be accessed in multiple ways. I want to try to keep this datastructure to keep to STL-standards as well as possible. So I already have lot's of typedefs, which give template parameters the STL-names. This is business as usual for me by now.
However I am not so sure how to correctly add iterators to my datastructure. The main problem I am facing is, that there would be multiple iteration policies over the datastructure. The easiest use case is iterating over all elements, which would be handled well by STL-Conforming iterators over the datastructure. However one might also want to access elements, which are somehow similar to a given key. I would also like to iterate over all these similar elements in a way which I can interface with the STL.
These are the ideas I have thought about so far:
Provide only one type of iterator:
This is basicly what std::map does. The start and end iterators for a subrange are provided by std::map::lower_bound() and std::map::upper_bound().
However this works well, because the iterators returned by begin(), end(), lower_bound() and upper_bound() are compatible, i.e. the operator==() can be given a very well defined meaning on these. In my case this would be hard to get right, or might even be impossible to give some clear semantics. For example I probably would get some cases where it1==it2 but ++it1!=++it2. I am not sure if this is allowed by the STL.
Provide multiple types of iterators:
Much easier to provide clean operator==() semantics. Nasty on the other hand because it enlarges the number of types.
Provide one type of iterator and use stl::algorithms for specialized access
I am not sure if this is possible at all. The iteration state should be kept by the iterator somehow (either directly or in a Memento). Using this approach would mean to specialize all stl::algorithms and access the predicate directly in the specialization. Most likely impossible, and if possible a very bad idea.
Right now I am mostly opting for Version 1, i.e. to provide only one type of iterator at all. However since I am not clear on how to clean up the semantics, I have not yet decided.
How would you handle this?

Standard containers support two iteration policies with two iterator types: ::iterator and ::reverse_iterator. You can convert between the two using the constructor of std::reverse_iterator, and its member function base().
Depending how similar your iteration policies are, it may or may not be easy to provide conversions to different iterator types. The idea is that the result should point at the "equivalent position" in the iteration policy of the destination type. For reverse iterators, this equivalence is defined by saying that if you insert at that point, the result is the same. So if rit is a reverse iterator, vec.insert(rit.base(), ...) inserts an element "before" rit in the reverse iteration, that is to say after the element pointed to by rit in the container. This is quite fiddly, and will only get worse when the iteration policies are completely unrelated. But if all of your iterator types are (or can be made to look like) wrappers around the "normal" iterator that goes over all elements, then you can define conversions in terms of that underlying iterator position.
You only actually need conversions if there are member functions that add or remove elements of the container, because you probably don't want to have to provide a separate overload for each iterator type (just like standard containers don't define insert and erase for reverse iterators). If iterators are used solely to point at elements, then most likely you can do without them.
If the different iteration policies are all iterating in the normal order over a subset of the elements, then look at boost::filter_iterator.
I probably would get some cases where it1==it2 but ++it1!=++it2. I am
not sure if this is allowed by the STL.
If I understand correctly, you got it1 by starting at thing.normal_begin(), and you got it2 by starting at thing.other_policy_begin(). The standard containers don't define the result of comparing iterators of the same type that belong to different ranges, so if you did use a common type, then I think this would be fine provided the documentation makes it clear that although operator== does happen to work, the ranges are separate according to where the iterator came from.
For example, you could have a skip_iterator which takes as a constructor parameter the number of steps it should move forward each time ++ is called. Then you could either include that integer in the comparison, so that thing.skip_begin(1) != thing.skip_begin(2), or you could exclude it so that thing.skip_begin(1) == thing.skip_begin(2) but ++(++(thing.skip_begin(1))) == ++(thing.skip_begin(2)). I think either is fine provided it's documented, or you could document that comparing them is UB unless they came from the same starting point.

Why are more types a problem? It does not necessarily mean much more code. For instance, you could make you iterator-type a template that takes an iteration-policy as template-parameter. The iteration-policy could then provide the implementation of the iteration:
struct iterate_all_policy {
iterate_all_policy(iterator<iterate_all_policy> & it) : it(it) {}
void advance() { /* implement specific advance here */ }
private:
iterator<iterate_all_policy> & it;
}
You will probably have to make the iteration-policy-classes friends of the iterator-types.

Check whether iterator belongs to a list

Is there any way to check whether a given iterator belongs to a given list in C++?

The obvious but invalid approach
You can't simply iterate through the list, comparing each iterator value to your "candidate".
The C++03 Standard is vague about the validity of == applied to iterators from different containers (Mankarse's comment on Nawaz's answer links http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2948.html#446), some compilers (eg. VC++2005 debug mode) warn if you do so, but despite all that it may actually work reliably depending on your compiler/libraries - check its documentation if you don't care about portability.
The C++11 Standard is very explicit, you can't compare iterators to different containers:
§ 24.2.5 The domain of == for forward iterators is that of iterators over the same underlying sequence.
So, the answers to this question that rely on operator== are questionable now, and invalid in future.
An oft-valid approach
What you can do is iterate along the list, comparing the address of the elements (i.e. &*i) to the address of the object to which your other iterate points.
Mankarse's comment cautions that this might not work as intended for objects providing their own operator&. You could work around this using std::addressof, or for C++03 boost's version
Martin's comment mentions that you have to assume the candidate iterator that you're testing list membership for is safely dereferenceable - i.e. not equal to an end() iterator on the container from which it came. As Steve points out - that's a pretty reasonable precondition and shouldn't surprise anyone.
(This is fine for all Standard containers as stored elements never have the same address, but more generally user-defined containers could allow non-equal iterators to address the same value object (e.g. supporting cycles or a "flyweight pattern" style optimisation), in which case this approach would fail. Still, if you write such a container you're probably in a position to design for safe iterator comparison.)
Implementation:
template <class IteratorA, class IteratorB, class IteratorC>
inline bool range_contains(IteratorA from, const IteratorB& end,
const IteratorC& candidate)
{
while (from != end)
if (&*from++ == &*candidate)
return true;
return false;
}
Notes:
This adopts the Standard library approach of accepting a range of iterator positions to search.
The types of each iterator are allowed to vary, as there are portability issues, e.g. containers where begin() returns an iterator but end() returns a const_iterator.
Iterators other than from are taken by const reference, as iterators can sometimes be non-trivial objects (i.e. too large to fit in a register, relatively expensive to copy). from is needed by value as it will be incremented through the range.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js