Is there any way to check whether a given iterator belongs to a given list in C++?
The obvious but invalid approach
You can't simply iterate through the list, comparing each iterator value to your "candidate".
The C++03 Standard is vague about the validity of == applied to iterators from different containers (Mankarse's comment on Nawaz's answer links http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2948.html#446), some compilers (eg. VC++2005 debug mode) warn if you do so, but despite all that it may actually work reliably depending on your compiler/libraries - check its documentation if you don't care about portability.
The C++11 Standard is very explicit, you can't compare iterators to different containers:
§ 24.2.5 The domain of == for forward iterators is that of iterators over the same underlying sequence.
So, the answers to this question that rely on operator== are questionable now, and invalid in future.
An oft-valid approach
What you can do is iterate along the list, comparing the address of the elements (i.e. &*i) to the address of the object to which your other iterate points.
Mankarse's comment cautions that this might not work as intended for objects providing their own operator&. You could work around this using std::addressof, or for C++03 boost's version
Martin's comment mentions that you have to assume the candidate iterator that you're testing list membership for is safely dereferenceable - i.e. not equal to an end() iterator on the container from which it came. As Steve points out - that's a pretty reasonable precondition and shouldn't surprise anyone.
(This is fine for all Standard containers as stored elements never have the same address, but more generally user-defined containers could allow non-equal iterators to address the same value object (e.g. supporting cycles or a "flyweight pattern" style optimisation), in which case this approach would fail. Still, if you write such a container you're probably in a position to design for safe iterator comparison.)
Implementation:
template <class IteratorA, class IteratorB, class IteratorC>
inline bool range_contains(IteratorA from, const IteratorB& end,
const IteratorC& candidate)
{
while (from != end)
if (&*from++ == &*candidate)
return true;
return false;
}
Notes:
This adopts the Standard library approach of accepting a range of iterator positions to search.
The types of each iterator are allowed to vary, as there are portability issues, e.g. containers where begin() returns an iterator but end() returns a const_iterator.
Iterators other than from are taken by const reference, as iterators can sometimes be non-trivial objects (i.e. too large to fit in a register, relatively expensive to copy). from is needed by value as it will be incremented through the range.
Related
Iterators from different containers cannot be compared (see for example here:https://stackoverflow.com/a/4664519/225186) (or more technically, it doesn't need to make sense.)
This raises another question, can iterator from different ranges be assigned to each other?
Container A = ...;
Container B = ...;
auto it1 = A.begin();
it1 = B.begin(); // it1 first belonged to A, does it1 need to belong to B later
Is the last line required to work by the Iterator concept in some standard or in the accepted practices or in the upcoming std ranges?
Since equality and assigment are so intertwined, it seem that if equality (==) is not always well defined then assignment (=) doesn't need to be well defined either, and probably for similar underlying reasons.
The reason I ask is not purely academic, because a certain iterator implementation could have some "metadata" from the container, and that (depending on the implementation) may or may not be able to be reassigned or simply a waste to be reassigned.
(For example a stride information that is unique to A and doesn't agree with that of B.
Another example is when the iterator stores a reference to the original range.)
This could allow that when assignment is tried, a particular field (member) might be left untouched.
One could make the assignment work in some cases, and that may produce less surprises, but it could also limit the implementations, and the question is would it be really necessary to define/allow the assignment between iterators of different origin (provenance)?
UPDATE 2021:
The linked documents read things like:
[...] the term the domain of == is used in the ordinary mathematical
sense to denote the set of values over which == is (required to be)
defined. This set can change over time. Each algorithm places
additional requirements on the domain of == for the iterator values it
uses. These requirements can be inferred from the uses that algorithm
makes of == and !=.
So there is an implicit defined (by the algorithms) range of validity of ==
.
Another way to formulate this question is if the same caveats for the domain of applicability of == can be applied, by simple use of logic, to =.
The underlying idea is that defining == in isolation of = or vise versa doesn't make sense.
(and also because I found a motivating case).
If you check out cppreference.com for the container you are interested you can find out the requirements for its iterators.
If we look at std::vector for example, its iterators are specified to be LegacyRandomAccessIterator. If you follow the hierarchy of definitions up from there to the base LegacyIterator you'll see that iterators are required to be CopyAssignable which means you must be able to assign one iterator to another of the same type.
All of the standard library containers use iterators derived from LegacyIterator, containers from other libraries or containers are free to ignore these requirements but it would be quite surprising to users if iterators weren't CopyAssignable and even more surprising if iterators were only not CopyAssignable between containers of the same type as that would potentially only be a runtime failure and not a compile time failure.
I have not found any direct reference to range/range-adaptor/range-view specific invalidation rules when modifying the underlying container.
Intuition suggests it would be exactly the same as pointer/iterator invalidation rules -- which are specified within the containers section of the standard.
The current container invalidation wording is as follows:
"...invalidates all the references, pointers, and iterators referring
to the elements in the sequence, as well as the past-the-end
iterator."
Which raises the question: Do all ranges necessarily "refer to the elements of the sequence", or, could they be accessing elements through the interface of the container?
It seems to me that most range adaptors already access a sequence without referring directly to the elements of that sequence (i.e. lazy views just build up iterator adaptors).
What seems to matter is the underlying range at the base of the view pyramid, so to speak.
We all learn at some point, that you cannot do std::vector::push_back while iterating that same vector, because the memory may move and invalidate the iteration. But, we also learn, that you can use std::vector::operator[] access with push_back, so long as you are careful with checking your size() bounds correctly.
It seems to me the same rules would apply to ranges/adaptors/views.
So: is it possible to force some equivalent to std::ranges::views::all (or, perhaps take_view) over a random access container to use array indexing (or some equivalent indirect/lazy element access), and to not use iteration directly?
Something to allow this:
std::vector<People> people = ...;
for (auto& person : std::ranges::views::lazy_all(people)) { // or ranges::lazy_take_view(people, people.size())
if (person.has_new_child()) {
people.push_back(person.get_new_child());
}
}
I'm currently playing with C++20 ranges and while implementing my own view I came up with the same question: what are the rules for iterator invalidation for views?
As far as I can see, ranges heavily use meta-programming and under the hood they construct a hierarchy of state machines. Actual types of these state machines are often hidden [1], so we may have difficulties to assume their restrictions. Iterator invalidation is a part of these restrictions, so to specify when and how iterators are invalidated when we construct a view hierarchy can be quite challenging if not impossible. Even if we manage to describe these rules they could be impossible to memorise, not to mention to use efficiently.
Ranges V3 library has the following recommendation:
View validity
Any operation on the underlying range that invalidates its iterators or sentinels will also invalidate any view that refers to any part of that range. Additionally, some views (e.g., views::filter), are invalidated when the underlying elements of the range are mutated. It is best to recreate a view after any operation that may have mutated the underlying range.
https://github.com/ericniebler/range-v3/blob/master/doc/index.md
The restriction given above just slashes away all concerns and although it is stricter than the rules for the standard containers it establishes a simple rule to memorise about any view iterators invalidation. At the same time it gives freedom to change a view implementation without touching that rule.
So, I suppose, it is safe to assume that ranges in C++20 standard are subject to the same limitations.
[1] My observation are based on MSVC implementation for ranges where range adapters can actually spawn different types based on strategies. So when you pipeline std::views::take(), for example, you may suddenly end up with std::span().
Each range adaptor or view has its own rules for how it interacts with the underlying range. And the standard spells out these interactions, if sometimes obliquely.
ranges::ref_view for example is explicitly stated to work as if it holds a pointer to the range. And its begin/end functions behave as if they call that range's begin/end functions, as well as forwarding any other functionality to the range it was given. So its interactions are pretty clear, since its iterators are the exact same type as those of the underlying range.
For a range like ranges::filter_view, it's a bit more difficult to track down. filter_view's iterator behavior is based on the behavior of filter_iterator. That type behaves as if it stores an iterator to the underlying range (due to the exposition-only iterator member). So a filter_iterator will be invalidated whenever the underlying range's iterators are. And there is no exposition member of filter_view that holds an iterator, so you might expect that calling begin will always get a fresh, valid filter_iterator.
But it won't. There is a major caveat buried in the description of filter_view::begin. A semantic component of the behavior of a range's begin function is that it must execute in amortized constant time. Finding the start of a filtered list is a linear-time operation. Therefore, filter_view::begin is required to only do this linear-time operation once and then internally cache the result.
Which means it does store an iterator, even if it's not clear that it does. So if you invalidate whatever the begin filter_iterator is using, you have invalidated the filter_range's begin iterator and must construct a new one.
In summary, if you want to know the iterator invalidation behavior for a view, you must read through the entire description of that view. But this is true of containers as well; there isn't a nice, neat section of the standard which spells out exactly when iterators, references, and pointers are invalidated. Cppreference has a nice list, but the standard leaves it up to the definition of each function in the class.
I have a range of integers [start, end] and a non-decreasing monotonic function f(i).
So conceptually, I have a non-decreasing sequence [f(start), f(start + 1), .. , f(end)].
Can I use std::upper_bound on that sequence to find the first element i in the range that holds f(i) > some_value ?
Conceptually, I'd like something like this:
std::upper_bound(start, end + 1, some_value, [&](int lhs, int rhs) {
return f(lhs) < f(rhs);
});
But this doesn't compile because start and end + 1 do not meet the requirements of forward iterators.
The short answer is yes, since std::upper_bound works on iterators, not on containers. But iterators themselves are instances of corresponding class (for example, std::vector<int>::iterator or whatnot).
If you construct some specific class that will meet the requirements of ForwardIterator not being actually bound to some sort of container, while still meaning something (for example, if you want to generate your sequence procedurally), it should work just fine.
Note that simple integer will not do the trick. On the other hand, a class, whose objects hold the value of your function for a particular argument value (with some additional batteries), will.
There are basically two answers:
Would it work by the standard or would it work with all practical implementations of the STL?
By the standard, as T.C. pointed out already, there are some strict requirements on iterators, especially that *it has to return a (possibly const) reference to value_type (which we would satisfy by returning the reference to a member of the iterator), but we also need that for it1 == it2, *it1 and *it2 are references bound to the same object, which is only possible if we have a distinct object for every number in the range.
If you want to do use this idea in practice, I don't believe any implementation of std::upper_bound or similar methods actually relies on this reference equality, so you could just use a class that encapsulates an integer as an iterator, only overloading the necessary methods. As far as I can see, boost::irange fulfills these requirements
As you can see, this is not strictly standard-compliant, but I see no reason why any implementation of binary search should rely on such strong requirements for the iterator, if the underlying 'storage' is const anyway.
No, not practically, but yes in practice, but no if you want to be practical.
No
upper_bound requires ForwardIterator. ForwardIterator requires that * returns an actual reference, and that if two iterators are equal then they refer to the same object.
Not practically
For a container-less iterator, this requires an insanely complex iterator that caches the values it returns in a shared global map of some kind. To make it half practical, note that the iterator requirements say very little about the lifetime of said reference; so you'd want to reference count and destroy said values as the iterators in question cease to exist.
Such a solution requires synchronization, global state, and is significantly more expensive and complex than something like boost::integer_range. No sane person would write this except as an exercise demonstrating why the standard needs to be fixed.
But yes in practice
No sane implementation of upper_bound actually requires that the iterators in question are full-scale forward iterators, barring one that does full concept-checks to validate against the standard (and not against what the actual algorithm needs). Input iterators with stability on the values returned almost certainly does it. There is no such concept in the C++ standard, and forward iterator is the weakest iterator category in the standard that satifies it.
This problem, of effectively demanding iterators be backed by containers, is a flaw in the standard in my opinion. Container-free iterators are powerful and useful, except they rarely technically work in standard containers.
Adding new iterator categories has proved problematic, because there is little way to do it without breaking existing code. They looked into it for contiguous iterators, and wrote it off as impractical (I don't know all the details of what they tried).
Adding new iterator concepts that are not backed by tags is more possible, but probably will have to wait until concepts are part of the C++ language and not just the standard; then experimenting with adding new concepts becomes something you can specify in C++ instead of in standardese, which makes it far easier.
But no if you want to be practical
This does, however, result in an ill-formed program, no diagnostic required. So consider if it is worth it; it may actually be easier to reimplement upper_bound than maintain a program whose every excution is undefined behavior, and every compile at the mercy of a compiler upgrade.
I have an iterator that is heavier than an int or pointer (it is meant to iterate through the nearest neighbours of a site in a three-dimensional periodic lattice, so it must contain certain private support structures). To tell whether the iterator is at the end of the container I only need one comparison, but to create an iterator pointing to one-past end I would have to create also the support structurres.
I thus figured I could have the end() mehtod of the container return an int, and then overload operator==() to avoid creating a full one-past-end iterator. This answer hints that it may be a good idea. For bidirectional iterators it seems this is wrong, as I later learnt (from e.g. this question) that whatever end() returns must be decrementable to the last element.
Since my iterator is only forward, I thought it would still be fine, but now I find I cannot use std::find() (and possibly other STL algorithms), which expects two arguments of the same (iterator) type.
So, my question is: is having end() return a type different than begin() a violation of standard behavior? Is it such a bad idea?
They must be the same type.
Changing this is the focus of some of the most hopeful range proposals. Its not a good requirement, but it is required, at least for now.
I am writing (as a self-teaching exercise) a simple STL-Like range. It is an Immutable-Random-Access "container". My range, keeps only the its start element, the the number of elements and the step size(the difference between two consecutive elements):
struct range
{
...
private:
value_type m_first_element, m_element_count, m_step;
};
Because my range doesn't hold the elements, it calculates the desired element using the following:
// In the standards, the operator[]
// should return a const reference.
// Because Range doesn't store its elements
// internally, we return a copy of the value.
value_type operator[](size_type index)
{
return m_first_element + m_step*index;
}
As you can see, I am not returning a const reference as the standards say. Now, can I assume that a const reference and a copy of the element are the same in terms of using the non-mutating algorithms in the standard library?
Any advice about the subject is greatly appreciated.
#Steve Jessop: Good point that you mentioned iterators.
Actually, I used sgi as my reference. At the end of that page, it says:
Assuming x and y are iterators from the same range:
Invariants Identity
x == y if and only if &*x == &*y
So, it boils down to the same original question I've asked actually :)
The standard algorithms don't really use operator[], they're all defined in terms of iterators unless I've forgotten something significant. Is the plan to re-implement the standard algorithms on top of operator[] for your "ranges", rather than iterators?
Where non-mutating algorithms do use iterators, they're all defined in terms of *it being assignable to whatever it needs to be assignable to, or otherwise valid for some specified operation or function call. I think all or most such ops are fine with a value.
The one thing I can think of, is that you can't pass a value where a non-const reference is expected. Are there any non-mutating algorithms which require a non-const reference? Probably not, provided that any functor parameters etc. have enough const in them.
So sorry, I can't say definitively that there are no odd corners that go wrong, but it sounds basically OK to me. Even if there are any niggles, you may be able to fix them with very slight differences in the requirements between your versions of the algorithms and the standard ones.
Edit: a second thing that could go wrong is taking pointers/references and keeping them too long. As far as I can remember, standard algorithms don't keep pointers or references to elements - the reason for this is that it's containers which guarantee the validity of pointers to elements, iterator types only tell you when the iterator remains valid (for instance a copy of an input iterator doesn't necessarily remain valid when the original is incremented, whereas forward iterators can be copied in this way for multi-pass algorithms). Since the algorithms don't see containers, only iterators, they have no reason that I can think of to be assuming the elements are persistent.
Items in STL containers are expected to be copied around all the time; think about when a vector has to be reallocated, for example. So, your example is fine, except that it only works with random iterators. But I suspect the latter is probably by design. :-P
Do you want your range to be usable in STL algorithms? Wouldn't it be better off with the first and last elements? (Considering the fact that end() is oft required/used, you will have to pre-calculate it for performance.) Or, are you counting on the contiguous elements (which is my second point)?
Since you put "container" in "quotes" you can do whatever you want.
STL type things (iterators, various member functions on containers..) return references because references are lvalues, and certain constructs (ie, myvec[i] = otherthing) can compile then. Think of operator[] on std::map. For const references, it's not a value just to avoid the copy, I suppose.
This rule is violated all the time though, when convenient. It's also common to have iterator classes store the current value in a member variable purely for the purpose of returning a reference or const reference (yes, this reference would be invalid if the iterator is advanced).
If you're interested in this sort of stuff you should check out the boost iterator library.