Avoiding iterators being invalidated during vector iteration - c++

What are recommended methods of avoiding errors, when iterating through a vector; where any number of elements in the vector may (directly, or indirectly) cause insertions or removals of elements - invaliding the iterators?
Specifically, I'm asking in relation to games programming, with elements in the vector being game objects; some of which can spawn other objects, and some of which will be killed and need removed when they are updated. Because of this, reserving a large capacity before iteration is also not possible, as it's entirely unknown how many elements can be expected to be added.

It's not possible to say in general. Part of the reason that the iterators are invalidated is that there's no way to read your mind to know what element you would like the invalidated iterators to refer to.
For example, I have an iterator to element 3 of 5. I erase element 2. Should the iterator point to the new element 2 (because it's "the same value moved down"), or should it point to the new element 3 (because it's "the same element of the vector with a different value moved down into it")?
Practically, your options are:
Do not insert/erase elements while someone else has iterators. This means altering your code to make the changes at another time.
Use indexes instead of iterators. This results in the second option above ("index refers to the same element with a new value"). Beware that indexes can also become invalid, if you erase enough elements that the index is off the end.
Use a different data structure with different iterator invalidation rules. For example a std::list would give you the first option above ("iterator refers to the same element in a new position")
The iterator that you're using to iterate should never be harmfully invalidated by insertions/erases of a single element at that position. It is invalidated, but both functions return a new iterator that you can use to continue iterating. See their documentation for what element that new iterator refers to. That's why the problem only arises when you're using multiple iterators on the same container.

Here are some ideas:
Iterate over the vector using an integer index. This won't get invalidated, but you need to take care to adjust the current index when inserting/removing elements.
Iterate over a copy of the vector.
Instead of making changes to the vector as you go along, keep track of what needs to be inserted/removed, and apply the changes after you've finished iterating.
Use a different container, for example a linked list.

Vector might not be the best container to support intensive insertion/removal (especially when the size is large).
But if you have to use vector, IMHO the safest way would be to work not with an iterator but with an index and keep track (and adjust the index) of items being inserted/deleted before the current position. This way you will at least avoid problems with reallocations.
And recalculate v.size()/v.end() after each iteration.

Use a separate vector and std::move the elements which don't get removed to it along with any created new elements. Then drop the old vector.
This way you don't have to remove/insert anything from/to the original vector and the iterators don't get invalidated.

You can, of course, go around the problem by using a container that doesn't invalidate the iterators on insertion and removal.
I'd like to recommend a different approach. Essentially you're iterating over the vector to create a new state in the game world. Instead of changing the state of the world as you traverse the vector, store what needs to change in a new vector. Then, after you've completely examined the old state, apply the changes you stored in the new vector.
This approach has one conceptual advantage: every object's new state depends on the old state only. If you apply the changes as you traverse the old state, and decisions about one object can depend on other objects, the order of traversal can affect the outcome, which can give "earlier" or "later" objects an unfair advantage.

Related

Is every "vectors" of STL deque's implementation always has the same size?

I have read The C++standard Library A Tutorial and reference 2nd, it said that deque's implementation include many blocks, I was curious that if i insert a element in the middle of the deque, will all the elements after the new inserted elements be moved backward just like vector,Or it will only move the elements in the inserted block?
As Igor said, the standard doesn't mention such details. However, given that it does say that all pointers, iterators and references are invalidated, I think you can assume that it moves more than the elements in a single "block".
As an aside, given the iterator requirements for deque, all the blocks (except the first and the last one) have to be kept full. Random access iterators require constant time "increment by N", and that can't be done if you have to count how many items are in each block (or, at least, I don't see a way to do that). So that would imply that all the elements either before or after the insertion point have to be moved. (again, not just the ones in the same "block")

How can I point to a member of a std::set in such a way that I can tell if the element has been removed?

An iterator into a std::set becomes invalidated if the item it's pointing to is erased. (It does not get invalidated if the set is modified in any other way, which is nice.) However, there is no way to detect whether an iterator has been invalidated or not.
I'm implementing an algorithm that requires me to be able to keep track of members of a std::set in such a way that I can erase them in constant time, but without risking undefined behaviour if I try to delete the same one twice. If I have two iterators pointing to the same member of a set, Bad Things will happen if I try to erase both of them.
My question is, how can I avoid this? Is there some way to implement something that behaves like an iterator into a set, but which knows when it has been invalidated?
Incidentally, I'm using std::set because this is a performance critical situation and I need the complexity guarantees that set provides. I'm happy to accept answers that suggest a different data structure, but only if it allows me to (a) access and remove the smallest element in constant time, (b) remove the pointed-to elements in constant time, and (c) insert elements in O(log(N)) time or better. C++11 is OK.
You could keep a set of shared pointers. And every time you store an iterator, pair it with a weak pointer to the element. When you want to erase the element, first check the weak pointer to see if the object still exists.

insert in C++ vector

I want to insert a bit from a bitset in the beginning of a vector. I am having a hard time understanding how to do that. Here is how I think I can do it:
keyRej.insert(x, inpSeq[0]);
I don't know what to put in the place of x?
I don't know what to put in the place of x?
An iterator to the position you want to insert in:
keyRej.insert(keyRej.begin(), inpSeq[0]);
Semantically, the inserted element goes before the iterator passed as first argument. But this will result in all elements of the vector having to be moved across one position, and may also incur a re-allocation of the vector's internal data storage block. It also means that all iterators or references to the vector's elements are invalidated.
See this reference for std::vector::insert for more information.
Note that there are containers, such as std::deque, for which appending elements to the front is cheap, and reference (but not iterator) validity is maintained.
x is an iterator according to the documentation you probably read here, the new object is insert just before it.
keyRej.insert(keyRej.begin(), inpSeq[0]);

Difference between std::remove and erase for vector?

I have a doubt that I would like to clarify in my head. I am aware of the different behavior for std::vector between erase and std::remove where the first physically removes an element from the vector, reducing size, and the other just moves an element leaving the capacity the same.
Is this just for efficiency reasons? By using erase, all elements in a std::vector will be shifted by 1, causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy, that difference might matter, right?
Is this just for efficiency reason? By using erase all elements in a std::vector will be shifted by 1 causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy that difference mihgt matter, right?
The reason for using this idiom is exactly that. There is a benefit in performance, but not in the case of a single erasure. Where it does matter is if you need to remove multiple elements from the vector. In this case, the std::remove will copy each not removed element only once to its final location, while the vector::erase approach would move all of the elements from the position to the end multiple times. Consider:
std::vector<int> v{ 1, 2, 3, 4, 5 };
// remove all elements < 5
If you went over the vector removing elements one by one, you would remove the 1, causing copies of the remainder elements that get shifted (4). Then you would remove 2 and shift all remainding elements by one (3)... if you see the pattern this is a O(N^2) algorithm.
In the case of std::remove the algorithm maintains a read and write heads, and iterates over the container. For the first 4 elements the read head will be moved and the element tested, but no element is copied. Only for the fifth element the object would be copied from the last to the first position, and the algorithm will complete with a single copy and returning an iterator to the second position. This is a O(N) algorithm. The later std::vector::erase with the range will cause destruction of all the remainder elements and resizing the container.
As others have mentioned, in the standard library algorithms are applied to iterators, and lack knowledge of the sequence being iterated. This design is more flexible than other approaches on which algorithms are aware of the containers in that a single implementation of the algorithm can be used with any sequence that complies with the iterator requirements. Consider for example, std::remove_copy_if, it can be used even without containers, by using iterators that generate/accept sequences:
std::remove_copy_if(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, " "),
[](int x) { return !(x%2); } // is even
);
That single line of code will filter out all even numbers from standard input and dump that to standard output, without requiring the loading of all numbers into memory in a container. This is the advantage of the split, the disadvantage is that the algorithms cannot modify the container itself, only the values referred to by the iterators.
std::remove is an algorithm from the STL which is quite container agnostic. It requires some concept, true, but it has been designed to also work with C arrays, which are static in sizes.
std::remove simply returns a new end() iterator to point to one past the last non-removed element (the number of items from the returned value to end() will match the number of items to be removed, but there is no guarantee their values are the same as those you were removing - they are in a valid but unspecified state). This is done so that it can work for multiple container types (basically any container type that a ForwardIterator can iterate through).
std::vector::erase actually sets the new end() iterator after adjusting the size. This is because the vector's method actually knows how to handle adjusting it's iterators (the same can be done with std::list::erase, std::deque::erase, etc.).
remove organizes a given container to remove unwanted objects. The container's erase function actually handles the "removing" the way that container needs it to be done. That is why they are separate.
I think it has to do with needing direct access to the vector itself to be able to resize it. std::remove only has access to the iterators, so it has no way of telling the vector "Hey, you now have fewer elements".
See yves Baumes answer as to why std::remove is designed this way.
Yes, that's the gist of it. Note that erase is also supported by the other standard containers where its performance characteristics are different (e.g. list::erase is O(1)), while std::remove is container-agnostic and works with any type of forward iterator (so it works for e.g. bare arrays as well).
Kind of. Algorithms such as remove work on iterators (which are an abstraction to represent an element in a collection) which do not necessarily know which type of collection they are operating on - and therefore cannot call members on the collection to do the actual removal.
This is good because it allows algorithms to work generically on any container and also on ranges that are subsets of the entire collection.
Also, as you say, for performance - it may not be necessary to actually remove (and destroy) the elements if all you need is access to the logical end position to pass on to another algorithm.
Standard library algorithms operate on sequences. A sequence is defined by a pair of iterators; the first points at the first element in the sequence, and the second points one-past-the-end of the sequence. That's all; algorithms don't care where the sequence comes from.
Standard library containers hold data values, and provide a pair of iterators that specify a sequence for use by algorithms. They also provide member functions that may be able to do the same operations as an algorithm more efficiently by taking advantage of the internal data structure of the container.

Storing iterators inside containers

I am building a DLL that another application would use. I want to store the current state of some data globally in the DLL's memory before returning from the function call so that I could reuse state on the next call to the function.
For doing this, I'm having to save some iterators. I'm using a std::stack to store all other data, but I wasn't sure if I could do that with the iterators also.
Is it safe to put list iterators inside container classes? If not, could you suggest a way to store a pointer to an element in a list so that I can use it later?
I know using a vector to store my data instead of a list would have allowed me to store the subscript and reuse it very easily, but unfortunately I'm having to use only an std::list.
Iterators to list are invalidated only if the list is destroyed or the "pointed" element is removed from the list.
Yes, it'll work fine.
Since so many other answers go on about this being a special quality of list iterators, I have to point out that it'd work with any iterators, including vector ones. The fact that vector iterators get invalidated if the vector is modified is hardly relevant to a question of whether it is legal to store iterators in another container -- it is. Of course the iterator can get invalidated if you do anything that invalidates it, but that has nothing to do with whether or not the iterator is stored in a stack (or any other data structure).
It should be no problem to store the iterators, just make sure you don't use them on a copy of the list -- an iterator is bound to one instance of the list, and cannot be used on a copy.
That is, if you do:
std::list<int>::iterator it = myList.begin ();
std::list<int> c = myList;
c.insert (it, ...); // Error
As noted by others: Of course, you should also not invalidate the iterator by removing the pointed-to element.
This might be offtopic, but just a hint...
Be aware, that your function(s)/data structure would probably be thread unsafe for read operations. There is a kind of basic thread safety where read operations do not require synchronization. If you are going to store the sate how much the caller read from your structure it will make the whole concept thread unsafe and a bit unnatural to use. Because nobody assumes a read to be state-full operation.
If two threads are going to call it they will either need to synchronize the calls or your data structure might end-up in a race condition. The problem in such a design is that both threads must have access to a common synchronization variable.
I would suggest making two overloaded functions. Both are stateless, but one of them should accept a hint iterator, where to start next read/search/retrieval etc. This is e.g. how Allocator in STL is implemented. You can pass to allocator a hint pointer (default 0) so that it quicker finds a new memory chunk.
Regards,
Ovanes
Storing the iterator for the list should be fine. It will not get invalidated unless you remove the same element from the list for which you have stored the iterator. Following quote from SGI site:
Lists have the important property that
insertion and splicing do not
invalidate iterators to list elements,
and that even removal invalidates only
the iterators that point to the
elements that are removed
However, note that the previous and next element of the stored iterator may change. But the iterator itself will remain valid.
The same rule applies to an iterator stored in a local variable as in a longer lived data structure: it will stay valid as long as the container allows.
For a list, this means: as long as the node it points to is not deleted, the iterator stays valid. Obviously the node gets deleted when the list is destructed...