See related questions on past-the-end iterator invalidation:
this, this.
This is more a question of design, namely, is there (in STL or elsewhere) such concept as past-the-end iterator "revalidation"?
What I mean by this, and use case: suppose an algorithm needs to "tail" a container (such as a queue). It traverses the container until end() is reached, then pauses; independently from this, another part of the program enqueues more items in the queue. How is it possible for the algorithm to (EDIT) efficiently tell, "have more items been enqueued" while holding the previously past-the-end iterator (call it tailIt)? (this would imply it is able to check if tailIt == container.end() still, and if that is false, conclude tailIt is now valid and points to the first element that was inserted).
Please don't dismiss the question as "no, there isn't" - I'm looking to form judgment around how to design some logic in an idiomatic way, and have many options (in fact the iterators in question are to a hand-built data structure for which I can provide this property - end() revalidation - but I would like to judge if it is a good idea).
EDIT: made it clear we have the iterator tailIt and a reference to container. A trivial workaround for what I'm trying to do is, also remember count := how many items you processed, and then check is container.size() == count still, and if not, seek to container[count] and continue processing from there. This comes with many disadvantages (extra state, assumption container doesn't pop from the front (!), random-access for efficient seeking).
Not in general. Here are some issues with your idea:
Some past-the-end iterators don't "point" to the data block at all; in fact this will be true of any iterator except a vector iterator. So, overall, an extant end-iterator just is never going to become a valid iterator to data;
Iterators often become invalidated when the container changes — while this isn't always true, it also precludes a general solution that relies on dereferencing some iterator from before the mutation;
Iterator validity is non-observable — you already need to know, before you dereference an iterator, whether or not it is valid. This is information that comes from elsewhere, usually your brain… by that I mean the developer must read the code and make a determination based on its structure and flow.
Put all these together and it is clear that the end iterator simply cannot be used this way as the iterator interface is currently designed. Iterators refer to data in a range, not to a container; it stands to reason, then, that they hold no information about a container, and if the container causes the range to change there's no entity that the iterator knows about that it can ask to find this out.
Is the described logic possible to create? Certainly! But with a different iterator interface (and support from the container). You could wrap the container in your own class type to do this. However, I advise against making things that look like standard iterators but behave differently; this will be very confusing.
Instead, encapsulate the container and provide your own wrapper function that can directly perform whatever post-enqueuement action you feel you need. You shouldn't need to watch the state of the end iterator to achieve your goal.
In the case for a std::queue, no there isn't (heh). Not because the iterators for a queue get invalidated once something is pushed, but because a queue doesn't have any iterators at all.
As for other iterator types, most (or any of them) of them don't require a reference to the container holder (the managing object containing all the info about the underlying data). Which is an trade-off for efficiency over flexibility. (I quickly checked the implementation of gcc's std::vector::iterator)It is possible to write an implementation for an iterator type that keeps a reference to the holder during its lifetime, that way the iterators never have to be invalidated! (unless the holder is std::move'd)
Now to throw in my professional opinion, I wouldn't mind seeing a safe_iterator/flex_iterator for cases where the iterator normally would be invalidated during iterations.
Possible user interface:
for (auto v : make_flex_iterator(my_vector)) {
if (some_outside_condition()) {
// Normally the vector would be invalidated at this point
// (only if resized, but you should always assume a resize)
my_vector.push_back("hello world!");
}
}
Literally revalidating iterators might be too complex to build for it's use case (I wouldn't know where to begin), but designing an iterator which simply never invalidates is quite trivial, with only as much overhead as a for (size_t i = 0; i < c.size(); i++); loop.But with that said, I cannot assure you how well the compiler will optimize, like unrolling loops, with these iterators. I do assume it will still do quite a good job.
Related
I have not found any direct reference to range/range-adaptor/range-view specific invalidation rules when modifying the underlying container.
Intuition suggests it would be exactly the same as pointer/iterator invalidation rules -- which are specified within the containers section of the standard.
The current container invalidation wording is as follows:
"...invalidates all the references, pointers, and iterators referring
to the elements in the sequence, as well as the past-the-end
iterator."
Which raises the question: Do all ranges necessarily "refer to the elements of the sequence", or, could they be accessing elements through the interface of the container?
It seems to me that most range adaptors already access a sequence without referring directly to the elements of that sequence (i.e. lazy views just build up iterator adaptors).
What seems to matter is the underlying range at the base of the view pyramid, so to speak.
We all learn at some point, that you cannot do std::vector::push_back while iterating that same vector, because the memory may move and invalidate the iteration. But, we also learn, that you can use std::vector::operator[] access with push_back, so long as you are careful with checking your size() bounds correctly.
It seems to me the same rules would apply to ranges/adaptors/views.
So: is it possible to force some equivalent to std::ranges::views::all (or, perhaps take_view) over a random access container to use array indexing (or some equivalent indirect/lazy element access), and to not use iteration directly?
Something to allow this:
std::vector<People> people = ...;
for (auto& person : std::ranges::views::lazy_all(people)) { // or ranges::lazy_take_view(people, people.size())
if (person.has_new_child()) {
people.push_back(person.get_new_child());
}
}
I'm currently playing with C++20 ranges and while implementing my own view I came up with the same question: what are the rules for iterator invalidation for views?
As far as I can see, ranges heavily use meta-programming and under the hood they construct a hierarchy of state machines. Actual types of these state machines are often hidden [1], so we may have difficulties to assume their restrictions. Iterator invalidation is a part of these restrictions, so to specify when and how iterators are invalidated when we construct a view hierarchy can be quite challenging if not impossible. Even if we manage to describe these rules they could be impossible to memorise, not to mention to use efficiently.
Ranges V3 library has the following recommendation:
View validity
Any operation on the underlying range that invalidates its iterators or sentinels will also invalidate any view that refers to any part of that range. Additionally, some views (e.g., views::filter), are invalidated when the underlying elements of the range are mutated. It is best to recreate a view after any operation that may have mutated the underlying range.
https://github.com/ericniebler/range-v3/blob/master/doc/index.md
The restriction given above just slashes away all concerns and although it is stricter than the rules for the standard containers it establishes a simple rule to memorise about any view iterators invalidation. At the same time it gives freedom to change a view implementation without touching that rule.
So, I suppose, it is safe to assume that ranges in C++20 standard are subject to the same limitations.
[1] My observation are based on MSVC implementation for ranges where range adapters can actually spawn different types based on strategies. So when you pipeline std::views::take(), for example, you may suddenly end up with std::span().
Each range adaptor or view has its own rules for how it interacts with the underlying range. And the standard spells out these interactions, if sometimes obliquely.
ranges::ref_view for example is explicitly stated to work as if it holds a pointer to the range. And its begin/end functions behave as if they call that range's begin/end functions, as well as forwarding any other functionality to the range it was given. So its interactions are pretty clear, since its iterators are the exact same type as those of the underlying range.
For a range like ranges::filter_view, it's a bit more difficult to track down. filter_view's iterator behavior is based on the behavior of filter_iterator. That type behaves as if it stores an iterator to the underlying range (due to the exposition-only iterator member). So a filter_iterator will be invalidated whenever the underlying range's iterators are. And there is no exposition member of filter_view that holds an iterator, so you might expect that calling begin will always get a fresh, valid filter_iterator.
But it won't. There is a major caveat buried in the description of filter_view::begin. A semantic component of the behavior of a range's begin function is that it must execute in amortized constant time. Finding the start of a filtered list is a linear-time operation. Therefore, filter_view::begin is required to only do this linear-time operation once and then internally cache the result.
Which means it does store an iterator, even if it's not clear that it does. So if you invalidate whatever the begin filter_iterator is using, you have invalidated the filter_range's begin iterator and must construct a new one.
In summary, if you want to know the iterator invalidation behavior for a view, you must read through the entire description of that view. But this is true of containers as well; there isn't a nice, neat section of the standard which spells out exactly when iterators, references, and pointers are invalidated. Cppreference has a nice list, but the standard leaves it up to the definition of each function in the class.
the following question just shot through my head. For the c++ stl iterators, a common practice is to have something like:
for (iterator it=obj.begin(); it!=obj.end(); it++)
what I am wondering is actually that obj.begin() could have told "it" when to stop which would make the for loop look like:
for (iterator it=obj.begin(); !it.end(); it++)
The benefit would be to make the iterator more self contained, and one could save (iterator end()) in the container class.
Sometimes you want to do something other than iterate over the entire contents of a container. For example, you can create a pair of iterators that will iterate over only the first half of a container. So having a separate object to represent the end is more flexible since it allows the user more control of where to place the end of a range.
You are right that it is somewhat inconvenient for the most common case of iterating over everything. However, C++11 provides a range based for loop which makes looping over a whole container really easy, so like many things in programming, it's really just a matter of selecting the right construct to best express your intention.
If the iterator API would be designed like that, a pointer wouldn't be a valid iterator (since a pointer obviously wouldn't have an end()) method. So that would rule out the most straight-forward way to implement iterators for data structures with contiguous memory.
Some libraries provide "java-style iterators" that work like you describe.
However, the big problem with this scheme (hasNext(), etc) is that such iterators are classes.
For example, STL contains <algorithm> header that contains functions like std::copy, std::generate, std::sort, std::lower_bound, std::fill etc. All those routines use begin/end style iterators. As a result, you can use pointers with those functions. If those function were operating on iterators that are classes (i.e. if they called hasNext() instead of != end() internally), then you wouldn't be able to pass pointers into std::sort and such. In this scenario you would have to wrap everything into class, wasting your time, and losing access to <algorithm> is not worth minor convenience you'd get by adding atEnd() method to iterator class. That's probably the reason why iterators are compared to the end().
There is no requirement for iterator to "know" about the container. It could just know and care about an offset in a block of memory or a current node (or whatever is appropriate for the data structure being iterated on), without knowing anything about encompassing container. For example, a vector iterator could be implemented as a simple pointer, without knowing (by itself) where the vector ends.
Also, STL algorithms need to work on raw pointers as well, so iterators need to "mimic" pointers.
Ultimately it's just the decision made when the STL was invented (by Stepanov in the early 90s), and later ratified in the C++ standardization process, that an iterator would be a generalization of a pointer. From http://www.sgi.com/tech/stl/stl_introduction.html:
In the example of reversing a C array, the arguments to reverse are
clearly of type double*. What are the arguments to reverse if you are
reversing a vector, though, or a list? ... The answer is that the
arguments to reverse are iterators, which are a generalization of
pointers.
Iterators didn't have to be a generalization of pointers. The C++ standard libraries (and before that the STL) could in theory have used a different iteration model, in which an iteration would be represented either by a single iterator object or a single range object, instead of by a pair of iterators first and last.
I don't think there would be much performance difference. There certainly wouldn't be with modern C++ compilers. The STL (and the standard libraries based on it) always relied for performance on decent inlining by the compiler, and these classes wouldn't be any worse than what compilers already have to deal with in the bowels of the containers. Nor would there be any major difficulty in providing a simple wrapper that turns a pair of pointers into an iterator or range object.
Some people do prefer other iterator models - James Gosling for one (or if not him, whoever designed Java iterators). And some people prefer ranges (including plenty of C++ programmers: hence Boost.Range).
I suspect the authors of the STL and C++ liked the fact that STL-style iterators retained a kind of compatibility with C, that you could take a C algorithm that used pointers to operate on an array (or other range specified by the user with a pair of pointers), and transform it almost unchanged into a C++ algorithm that uses iterators to operate on a container (or other range). That's the kind of thinking that means your new language can be more easily assimilated by an existing user-base, which was one of the goals of C++ early on.
So questions like, "can we provide a means of testing for end which is a few characters shorter, at the cost of making pointers no longer be iterators and making iterators larger" probably would not have got much interest at the time. But that was then, and for example Andrei Alexandrescu has been banging on for a while now that in his opinion this is not longer the best choice, and that ranges are better than iterators.
Having the iterator work the way it does gives you the flexibility to work on subranges. You don't always have to start at begin() or end at end(). For example if you want to work with all the iterators containing a certain value you can use the return value of equal_range() for your beginning and ending.
The D language has a range which incorporates both the beginning and end of an iterator range. It was also proposed for C++: http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/7e52451ed8eea31c
Knowing where your iterator's begin and end (and rbegin and rend) are is certainly useful but the case you state is somewhat handled by C++11's range-based for-loop:
for (auto it: obj)
{
// Do something with *it;
}
It's much easier to get things done in C++11 without writing so much boiler-plate!
What exactly are iterators in the C++ STL?
In my case, I'm using a list, and I don't understand why you have to make an iterator std::list <int>::const_iterator iElementLocator; to display contents of the list by the derefrence operator:
cout << *iElementLocator; after assigning it to maybe list.begin().
Please explain what exactly an iterator is and why I have to dereference or use it.
There are three building blocks in the STL:
Containers
Algorithms
Iterators
At the conceptual level containers hold data. That by itself isn't very useful, because you want to do something with the data; you want to operate on it, manipulate it, query it, play with it. Algorithms do exactly that. But algorithms don't hold data, they have no data -- they need a container for this task. Give a container to an algorithm and you have an action going on.
The only problem left to solve is how does an algorithm traverse a container, from a technical point of view. Technically a container can be a linked list, or it can be an array, or a binary tree, or any other data structure that can hold data. But traversing an array is done differently than traversing a binary tree. Even though conceptually all an algorithm wants is to "get" one element at a time from a container, and then work on that element, the operation of getting the next element from a container is technically very container-specific.
It appears as if you'd need to write the same algorithm for each container, so that each version of the algorithm has the correct code for traversing the container. But there's a better solution: ask the container to return an object that can traverse over the container. The object would have an interface algorithms know. When an algorithm asks the object to "get the next element" the object would comply. Because the object came directly from the container it knows how to access the container's data. And because the object has an interface the algorithm knows, we need not duplicate an algorithm for each container.
This is the iterator.
The iterator here glues the algorithm to the container, without coupling the two. An iterator is coupled to a container, and an algorithm is coupled to the iterator's interface. The source of the magic here is really template programming. Consider the standard copy() algorithm:
template<class In, class Out>
Out copy(In first, In last, Out res)
{
while( first != last ) {
*res = *first;
++first;
++res;
}
return res;
}
The copy() algorithm takes as parameters two iterators templated on the type In and one iterator of type Out. It copies the elements starting at position first and ending just before position last, into res. The algorithm knows that to get the next element it needs to say ++first or ++res. It knows that to read an element it needs to say x = *first and to write an element it needs to say *res = x. That's part of the interface algorithms assume and iterators commit to. If by mistake an iterator doesn't comply with the interface then the compiler would emit an error for calling a function over type In or Out, when the type doesn't define the function.
I'm being lazy. So I would not type describing what an iterator is and how they're used, especially when there're already lots of articles online that you can read yourself.
Here are few that I can quote for a start, proividing the links to complete articles:
MSDN says,
Iterators are a generalization of
pointers, abstracting from their
requirements in a way that allows a
C++ program to work with different
data structures in a uniform manner.
Iterators act as intermediaries
between containers and generic
algorithms. Instead of operating on
specific data types, algorithms are
defined to operate on a range
specified by a type of iterator. Any
data structure that satisfies the
requirements of the iterator may then
be operated on by the algorithm. There
are five types or categories of
iterator [...]
By the way, it seems the MSDN has taken the text in bold from C++ Standard itself, specifically from the section §24.1/1 which says
Iterators are a generalization of
pointers that allow a C + + program to
work with different data structures
(containers) in a uniform manner. To
be able to construct template
algorithms that work correctly and
efficiently on different types of data
structures, the library formalizes not
just the interfaces but also the
semantics and complexity assumptions
of iterators. All iterators i support
the expression *i, resulting in a
value of some class, enumeration, or
built-in type T, called the value type
of the iterator. All iterators i for
which the expression (*i).m is
well-defined, support the expression
i->m with the same semantics as
(*i).m. For every iterator type X for
which equality is defined, there is a
corresponding signed integral type
called the difference type of the
iterator.
cplusplus says,
In C++, an iterator is any object
that, pointing to some element in a
range of elements (such as an array or
a container), has the ability to
iterate through the elements of that
range using a set of operators (at
least, the increment (++) and
dereference (*) operators).
The most obvious form of iterator is a
pointer [...]
And you can also read these:
What Is an Iterator?
Iterators in the Standard C++ Library
Iterator (at wiki entry)
Have patience and read all these. Hopefully, you will have some idea what an iterator is, in C++. Learning C++ requires patience and time.
An iterator is not the same as the container itself. The iterator refers to a single item in the container, as well as providing ways to reach other items.
Consider designing your own container without iterators. It could have a size function to obtain the number of items it contains, and could overload the [] operator to allow you to get or set an item by its position.
But "random access" of that kind is not easy to implement efficiently on some kinds of container. If you obtain the millionth item: c[1000000] and the container internally uses a linked list, it will have to scan through a million items to find the one you want.
You might instead decide to allow the collection to remember a "current" item. It could have functions like start and more and next to allow you to loop through the contents:
c.start();
while (c.more())
{
item_t item = c.next();
// use the item somehow
}
But this puts the "iteration state" inside the container. This is a serious limitation. What if you wanted to compare each item in the container with every other item? That requires two nested loops, both iterating through all the items. If the container itself stores the position of the iteration, you have no way to nest two such iterations - the inner loop will destroy the working of the outer loop.
So iterators are an independent copy of an iteration state. You can begin an iteration:
container_t::iterator i = c.begin();
That iterator, i, is a separate object that represents a position within the container. You can fetch whatever is stored at that position:
item_t item = *i;
You can move to the next item:
i++;
With some iterators you can skip forward several items:
i += 1000;
Or obtain an item at some position relative to the position identified by the iterator:
item_t item = i[1000];
And with some iterators you can move backwards.
And you can discover if you've reached beyond the contents of the container by comparing the iterator to end:
while (i != c.end())
You can think of end as returning an iterator that represents a position that is one beyond the last position in the container.
An important point to be aware of with iterators (and in C++ generally) is that they can become invalid. This usually happens for example if you empty a container: any iterators pointing to positions in that container have now become invalid. In that state, most operations on them are undefined - anything could happen!
An iterator is to an STL container what a pointer is to an array. You can think of them as pointer objects to STL containers. As pointers, you will be able to use them with the pointer notation (e.g. *iElementLocator, iElementLocator++). As objects, they will have their own attributes and methods (http://www.cplusplus.com/reference/std/iterator).
There already exists a lot of good explanations of iterators. Just google it.
One example.
If there is something specific you don't understand come back and ask.
I'd suggest reading about operator overloading in C++. This will tell why * and -> can mean essentially anything. Only then you should read about the iterators. Otherwise it might appear very confusing.
I've enabled iterator debugging in an application by defining
_HAS_ITERATOR_DEBUGGING = 1
I was expecting this to really just check vector bounds, but I have a feeling it's doing a lot more than that. What checks, etc are actually being performed?
Dinkumware STL, by the way.
There is a number of operations with iterators which lead to undefined behavior, the goal of this trigger is to activate runtime checks to prevent it from occurring (using asserts).
The issue
The obvious operation is to use an invalid iterator, but this invalidity may arise from various reasons:
Uninitialized iterator
Iterator to an element that has been erased
Iterator to an element which physical location has changed (reallocation for a vector)
Iterator outside of [begin, end)
The standard specifies in excruciating details for each container which operation invalidates which iterator.
There is also a somehow less obvious reason that people tend to forget: mixing iterators to different containers:
std::vector<Animal> cats, dogs;
for_each(cats.begin(), dogs.end(), /**/); // obvious bug
This pertain to a more general issue: the validity of ranges passed to the algorithms.
[cats.begin(), dogs.end()) is invalid (unless one is an alias for the other)
[cats.end(), cats.begin()) is invalid (unless cats is empty ??)
The solution
The solution consists in adding information to the iterators so that their validity and the validity of the ranges they defined can be asserted during execution thus preventing undefined behavior to occur.
The _HAS_ITERATOR_DEBUGGING symbol serves as a trigger to this capability, because it unfortunately slows down the program. It's quite simple in theory: each iterator is made an Observer of the container it's issued from and is thus notified of the modification.
In Dinkumware this is achieved by two additions:
Each iterator carries a pointer to its related container
Each container holds a linked list of the iterators it created
And this neatly solves our problems:
An uninitialized iterator does not have a parent container, most operations (apart from assignment and destruction) will trigger an assertion
An iterator to an erased or moved element has been notified (thanks to the list) and know of its invalidity
On incrementing and decrementing an iterator it can checks it stays within the bounds
Checking that 2 iterators belong to the same container is as simple as comparing their parent pointers
Checking the validity of a range is as simple as checking that we reach the end of the range before we reach the end of the container (linear operation for those containers which are not randomly accessible, thus most of them)
The cost
The cost is heavy, but does correctness have a price? We can break down the cost:
extra memory allocation (the extra list of iterators maintained): O(NbIterators)
notification process on mutating operations: O(NbIterators) (Note that push_back or insert do not necessarily invalidate all iterators, but erase does)
range validity check: O( min(last-first, container.end()-first) )
Most of the library algorithms have of course been implemented for maximum efficiency, typically the check is done once and for all at the beginning of the algorithm, then an unchecked version is run. Yet the speed might severely slow down, especially with hand-written loops:
for (iterator_t it = vec.begin();
it != vec.end(); // Oops
++it)
// body
We know the Oops line is bad taste, but here it's even worse: at each run of the loop, we create a new iterator then destroy it which means allocating and deallocating a node for vec's list of iterators... Do I have to underline the cost of allocating/deallocating memory in a tight loop ?
Of course, a for_each would not encounter such an issue, which is yet another compelling case toward the use of STL algorithms instead of hand-coded versions.
As far as I understand:
_HAS_ITERATOR_DEBUGGING will display a dialog box at run time to assert any incorrect iterator use including:
1) Iterators used in a container after an element is erased
2) Iterators used in vectors after a .push() or .insert() function is called
According to http://msdn.microsoft.com/en-us/library/aa985982%28v=VS.80%29.aspx
The C++ standard describes which member functions cause iterators to a container to become invalid. Two examples are:
Erasing an element from a container causes iterators to the element to become invalid.
Increasing the size of a vector (push or insert) causes iterators into the vector container become invalid.
I am building a DLL that another application would use. I want to store the current state of some data globally in the DLL's memory before returning from the function call so that I could reuse state on the next call to the function.
For doing this, I'm having to save some iterators. I'm using a std::stack to store all other data, but I wasn't sure if I could do that with the iterators also.
Is it safe to put list iterators inside container classes? If not, could you suggest a way to store a pointer to an element in a list so that I can use it later?
I know using a vector to store my data instead of a list would have allowed me to store the subscript and reuse it very easily, but unfortunately I'm having to use only an std::list.
Iterators to list are invalidated only if the list is destroyed or the "pointed" element is removed from the list.
Yes, it'll work fine.
Since so many other answers go on about this being a special quality of list iterators, I have to point out that it'd work with any iterators, including vector ones. The fact that vector iterators get invalidated if the vector is modified is hardly relevant to a question of whether it is legal to store iterators in another container -- it is. Of course the iterator can get invalidated if you do anything that invalidates it, but that has nothing to do with whether or not the iterator is stored in a stack (or any other data structure).
It should be no problem to store the iterators, just make sure you don't use them on a copy of the list -- an iterator is bound to one instance of the list, and cannot be used on a copy.
That is, if you do:
std::list<int>::iterator it = myList.begin ();
std::list<int> c = myList;
c.insert (it, ...); // Error
As noted by others: Of course, you should also not invalidate the iterator by removing the pointed-to element.
This might be offtopic, but just a hint...
Be aware, that your function(s)/data structure would probably be thread unsafe for read operations. There is a kind of basic thread safety where read operations do not require synchronization. If you are going to store the sate how much the caller read from your structure it will make the whole concept thread unsafe and a bit unnatural to use. Because nobody assumes a read to be state-full operation.
If two threads are going to call it they will either need to synchronize the calls or your data structure might end-up in a race condition. The problem in such a design is that both threads must have access to a common synchronization variable.
I would suggest making two overloaded functions. Both are stateless, but one of them should accept a hint iterator, where to start next read/search/retrieval etc. This is e.g. how Allocator in STL is implemented. You can pass to allocator a hint pointer (default 0) so that it quicker finds a new memory chunk.
Regards,
Ovanes
Storing the iterator for the list should be fine. It will not get invalidated unless you remove the same element from the list for which you have stored the iterator. Following quote from SGI site:
Lists have the important property that
insertion and splicing do not
invalidate iterators to list elements,
and that even removal invalidates only
the iterators that point to the
elements that are removed
However, note that the previous and next element of the stored iterator may change. But the iterator itself will remain valid.
The same rule applies to an iterator stored in a local variable as in a longer lived data structure: it will stay valid as long as the container allows.
For a list, this means: as long as the node it points to is not deleted, the iterator stays valid. Obviously the node gets deleted when the list is destructed...