Is it possible to store an iterator? - c++

For example, say I have a const_iterator:
QHash<const QString, QPair<const Node, double> >::const_iterator citer = adjNodeHash.begin();
Can I then store citer in a data structure (containing many iterators) and re-use it later, with it still referring to the same place I left off the next time I use it? (assuming I update it accordingly/use a reference to it when I am incrementing it)
I ask this because I have used this approach yet am getting some undefined bahaviour and am wondering if this is the culprit.
Any help would be much appreciated.

Iteration invalidation rules for std containers are described in the standard. QHash will also have some iterator invalidation rules in its documentation (hopefully!).
A stored iterator remains valid until invalidated. Most hash maps invalidate their iterators when they "rehash", which happens when they grow past a certain bound.
In practice, it is probably a bad idea to store an iterator into a hash map over a period in which elements are added or removed from it. Maintaining that iterator as valid will take constant maintenance and error checking, adding overhead to every use of that hash map, and any errors developing may not immediately show up, and the error that happens won't occur near the spot where the mistake is made.
On top of that, if you ever swap out what hash container you are using, the details of the iterator invalidation rules are going to be different. This makes refactoring in the future more painful.

Related

STL iterator revalidation for end (past-the-end) iterator?

See related questions on past-the-end iterator invalidation:
this, this.
This is more a question of design, namely, is there (in STL or elsewhere) such concept as past-the-end iterator "revalidation"?
What I mean by this, and use case: suppose an algorithm needs to "tail" a container (such as a queue). It traverses the container until end() is reached, then pauses; independently from this, another part of the program enqueues more items in the queue. How is it possible for the algorithm to (EDIT) efficiently tell, "have more items been enqueued" while holding the previously past-the-end iterator (call it tailIt)? (this would imply it is able to check if tailIt == container.end() still, and if that is false, conclude tailIt is now valid and points to the first element that was inserted).
Please don't dismiss the question as "no, there isn't" - I'm looking to form judgment around how to design some logic in an idiomatic way, and have many options (in fact the iterators in question are to a hand-built data structure for which I can provide this property - end() revalidation - but I would like to judge if it is a good idea).
EDIT: made it clear we have the iterator tailIt and a reference to container. A trivial workaround for what I'm trying to do is, also remember count := how many items you processed, and then check is container.size() == count still, and if not, seek to container[count] and continue processing from there. This comes with many disadvantages (extra state, assumption container doesn't pop from the front (!), random-access for efficient seeking).
Not in general. Here are some issues with your idea:
Some past-the-end iterators don't "point" to the data block at all; in fact this will be true of any iterator except a vector iterator. So, overall, an extant end-iterator just is never going to become a valid iterator to data;
Iterators often become invalidated when the container changes — while this isn't always true, it also precludes a general solution that relies on dereferencing some iterator from before the mutation;
Iterator validity is non-observable — you already need to know, before you dereference an iterator, whether or not it is valid. This is information that comes from elsewhere, usually your brain… by that I mean the developer must read the code and make a determination based on its structure and flow.
Put all these together and it is clear that the end iterator simply cannot be used this way as the iterator interface is currently designed. Iterators refer to data in a range, not to a container; it stands to reason, then, that they hold no information about a container, and if the container causes the range to change there's no entity that the iterator knows about that it can ask to find this out.
Is the described logic possible to create? Certainly! But with a different iterator interface (and support from the container). You could wrap the container in your own class type to do this. However, I advise against making things that look like standard iterators but behave differently; this will be very confusing.
Instead, encapsulate the container and provide your own wrapper function that can directly perform whatever post-enqueuement action you feel you need. You shouldn't need to watch the state of the end iterator to achieve your goal.
In the case for a std::queue, no there isn't (heh). Not because the iterators for a queue get invalidated once something is pushed, but because a queue doesn't have any iterators at all.
As for other iterator types, most (or any of them) of them don't require a reference to the container holder (the managing object containing all the info about the underlying data). Which is an trade-off for efficiency over flexibility. (I quickly checked the implementation of gcc's std::vector::iterator)It is possible to write an implementation for an iterator type that keeps a reference to the holder during its lifetime, that way the iterators never have to be invalidated! (unless the holder is std::move'd)
Now to throw in my professional opinion, I wouldn't mind seeing a safe_iterator/flex_iterator for cases where the iterator normally would be invalidated during iterations.
Possible user interface:
for (auto v : make_flex_iterator(my_vector)) {
if (some_outside_condition()) {
// Normally the vector would be invalidated at this point
// (only if resized, but you should always assume a resize)
my_vector.push_back("hello world!");
}
}
Literally revalidating iterators might be too complex to build for it's use case (I wouldn't know where to begin), but designing an iterator which simply never invalidates is quite trivial, with only as much overhead as a for (size_t i = 0; i < c.size(); i++); loop.But with that said, I cannot assure you how well the compiler will optimize, like unrolling loops, with these iterators. I do assume it will still do quite a good job.

How can I point to a member of a std::set in such a way that I can tell if the element has been removed?

An iterator into a std::set becomes invalidated if the item it's pointing to is erased. (It does not get invalidated if the set is modified in any other way, which is nice.) However, there is no way to detect whether an iterator has been invalidated or not.
I'm implementing an algorithm that requires me to be able to keep track of members of a std::set in such a way that I can erase them in constant time, but without risking undefined behaviour if I try to delete the same one twice. If I have two iterators pointing to the same member of a set, Bad Things will happen if I try to erase both of them.
My question is, how can I avoid this? Is there some way to implement something that behaves like an iterator into a set, but which knows when it has been invalidated?
Incidentally, I'm using std::set because this is a performance critical situation and I need the complexity guarantees that set provides. I'm happy to accept answers that suggest a different data structure, but only if it allows me to (a) access and remove the smallest element in constant time, (b) remove the pointed-to elements in constant time, and (c) insert elements in O(log(N)) time or better. C++11 is OK.
You could keep a set of shared pointers. And every time you store an iterator, pair it with a weak pointer to the element. When you want to erase the element, first check the weak pointer to see if the object still exists.

What is iterator invalidation?

I see it referenced a lot but no clear answer of what exactly it is. My experience is with higher level languages, so I'm unfamiliar about the presence of invalidity in a collections framework.
What is iterator invalidation?
Why does it come up? Why is it difficult to deal with?
Iterators are glorified pointers. Iterator invalidation is a lot like pointer invalidation; it means it suddenly points to junk data.
Because it's very natural but wrong to do things like this:
for(iterator it = map.begin(); it != map.end(); ++it) {
map.erase(it->first);
// whoops, now map has been restructured and iterator
// still thinks itself is healthy
}
Because that error right there? No compiler error, no warning, you lose. You just have to be trained well enough to watch for them and prevent them. Very insidious bugs if you don't know what you're doing. One of the design philosophies of C++ is speed over safety. The runtime check that would lead iterator invalidation to an exception instead of unspecified behavior is too expensive, in the view of C++ language designers.
You should be on high alert when you are iterating over a data structure and modifying structure itself, instead of merely the objects held in them. At that point you should probably run to the documentation and check whether the operation was illegal.
Iterator invalidation is what happens when an iterator type (an object supporting the operators ++, and *) does not correctly represent the state of the object it is iterating. For example:
int *my_array = new int[15];
int *my_iterator = &my_array[2];
delete[] my_array;
std::for_each(my_iterator, my_iterator + 5, ...); // invalid
That results in undefined behavior because the memory it is pointing to has been reclaimed by the OS.
This is only one scenario, however, and many other things cause an iterator to be 'invalidated', and you must be careful to check the documentation of the objects you are using.
The problem occurs when a container that is being processed using an iterator has its shape changed during the process. (We will assume a single-threaded application; concurrent access to a mutable container is a whole 'nother can of worms which we won't get into on this page). By "having its shape change", one of the following types of mutations is meant:
An insertion into the container (at any location)
Deletion of an
element from the container
Any operation that changes a key (in an
AssociativeContainer)
Any operation which changes the order of the
elements in a sorted container.
Any more complicated operation
consisting of one or more of the above (such as splitting a container
into two).
(From: http://c2.com/cgi/wiki?IteratorInvalidationProblem)
The concept is actually fairly simple, but the side-effects can be quite annoying. I would add that this problem affects not only C/C++ but slews of other low-level or mid-level languages, as well. (In some cases, even if they don't allow direct heap allocation)

Efficiently erasing elements in tr1::unordered_map

I am experimenting with tr1::unordered_map and stumbled upon the problem how to
efficiently delete elements. The 'erase' method offers to delete either by key or
by iterator. I would assume the latter to be more efficient, since the former
presumably involves an implicit find operation. On the other hand my investigations
on the internet have revealed that iterators may become invalid after calling
the insert() method.
I am interested in a typical real-world situation, where objects put into a hash table
have a life span which is long enough such that calls to insert() happen during that
life span. Thus may I conclude that in such a situation deletion by key is the only
option left? Are there any alternatives how to delete objects more efficiently? I am
fully aware that the question only matters in applications, where deletions happen
often. Whether this will be the case for my current project, remains to be seen, but
I would rather learn about these issues while designing my project rather than when
there is already a lot of code present.
The whole point of the unordered containers is to have the fastest possible lookup time. Worrying about the time it takes to erase an element by key sounds like the classic example of premature optimization.
If it matters a great deal to you, because you're keeping the iterator for some other reason, then C++0x says of std::unordered_map (quoting from the FDIS), in 23.2.5/11:
The insert and emplace members shall not affect the validity of
iterators if (N+n) < z * B, where N is the number of elements in the
container prior to the insert operation, n is the number of elements
inserted, B is the container’s bucket count, and z is the container’s
maximum load factor.
I haven't checked whether the tr1 spec has the same guarantee, but it's fairly logical based on the expected implementation.
If you can use this guarantee, then you can protect your iterators up to a point. As Mark says, though, lookup in unordered_map is supposed to be fast. Keeping a key rather than an iterator is worse than keeping an index rather than an iterator in a vector, but better than the equivalent for map.
Yes, insert() can invalidate all iterators. Therefore, I don't think there's a way to avoid the (implicit) lookup. The good news is that the latter is likely to be cheap.

What does enabling STL iterator debugging really do?

I've enabled iterator debugging in an application by defining
_HAS_ITERATOR_DEBUGGING = 1
I was expecting this to really just check vector bounds, but I have a feeling it's doing a lot more than that. What checks, etc are actually being performed?
Dinkumware STL, by the way.
There is a number of operations with iterators which lead to undefined behavior, the goal of this trigger is to activate runtime checks to prevent it from occurring (using asserts).
The issue
The obvious operation is to use an invalid iterator, but this invalidity may arise from various reasons:
Uninitialized iterator
Iterator to an element that has been erased
Iterator to an element which physical location has changed (reallocation for a vector)
Iterator outside of [begin, end)
The standard specifies in excruciating details for each container which operation invalidates which iterator.
There is also a somehow less obvious reason that people tend to forget: mixing iterators to different containers:
std::vector<Animal> cats, dogs;
for_each(cats.begin(), dogs.end(), /**/); // obvious bug
This pertain to a more general issue: the validity of ranges passed to the algorithms.
[cats.begin(), dogs.end()) is invalid (unless one is an alias for the other)
[cats.end(), cats.begin()) is invalid (unless cats is empty ??)
The solution
The solution consists in adding information to the iterators so that their validity and the validity of the ranges they defined can be asserted during execution thus preventing undefined behavior to occur.
The _HAS_ITERATOR_DEBUGGING symbol serves as a trigger to this capability, because it unfortunately slows down the program. It's quite simple in theory: each iterator is made an Observer of the container it's issued from and is thus notified of the modification.
In Dinkumware this is achieved by two additions:
Each iterator carries a pointer to its related container
Each container holds a linked list of the iterators it created
And this neatly solves our problems:
An uninitialized iterator does not have a parent container, most operations (apart from assignment and destruction) will trigger an assertion
An iterator to an erased or moved element has been notified (thanks to the list) and know of its invalidity
On incrementing and decrementing an iterator it can checks it stays within the bounds
Checking that 2 iterators belong to the same container is as simple as comparing their parent pointers
Checking the validity of a range is as simple as checking that we reach the end of the range before we reach the end of the container (linear operation for those containers which are not randomly accessible, thus most of them)
The cost
The cost is heavy, but does correctness have a price? We can break down the cost:
extra memory allocation (the extra list of iterators maintained): O(NbIterators)
notification process on mutating operations: O(NbIterators) (Note that push_back or insert do not necessarily invalidate all iterators, but erase does)
range validity check: O( min(last-first, container.end()-first) )
Most of the library algorithms have of course been implemented for maximum efficiency, typically the check is done once and for all at the beginning of the algorithm, then an unchecked version is run. Yet the speed might severely slow down, especially with hand-written loops:
for (iterator_t it = vec.begin();
it != vec.end(); // Oops
++it)
// body
We know the Oops line is bad taste, but here it's even worse: at each run of the loop, we create a new iterator then destroy it which means allocating and deallocating a node for vec's list of iterators... Do I have to underline the cost of allocating/deallocating memory in a tight loop ?
Of course, a for_each would not encounter such an issue, which is yet another compelling case toward the use of STL algorithms instead of hand-coded versions.
As far as I understand:
_HAS_ITERATOR_DEBUGGING will display a dialog box at run time to assert any incorrect iterator use including:
1) Iterators used in a container after an element is erased
2) Iterators used in vectors after a .push() or .insert() function is called
According to http://msdn.microsoft.com/en-us/library/aa985982%28v=VS.80%29.aspx
The C++ standard describes which member functions cause iterators to a container to become invalid. Two examples are:
Erasing an element from a container causes iterators to the element to become invalid.
Increasing the size of a vector (push or insert) causes iterators into the vector container become invalid.