Why is vector::iterator invalidated upon reallocation? - c++

I don't understand why a vector's iterator should be invalidated when a reallocation happens.
Couldn't this have been prevented simply by storing an offset -- instead of a pointer -- in the iterator?
Why was vector not designed this way?

Just to add a citation to the performance-related justification: when designing C++, Stroustrup thought it was vital that template classes like std::vector approach the performance characteristics of native arrays:
One reason for the emphasis on run-time efficiency...was that I wanted
templates to be efficient enough in time and space to be used for
low-level types such as arrays and lists.
...
Higher-level alternatives -- say, a range-checked array with a size()
operation, a multidimensional array, a vector type with proper numeric
vector operations and copy semantics, etc. -- would be accepted by
users only if their run-time, space, and notational convenience
approached those of built-in arrays.
In other words, the language mechanism supplying parameterized types
should be such that a concerned user should be able to afford to
eliminate the use of arrays in favor of a standard library class.
Bjarne Stroustrup, Design and Evolution of C++, p.342.

Because for iterators to do that, they'd need to store a pointer to the vector object. For each data access, they'd need to follow the pointer to the vector, then follow the pointer therein to the current location of the data array, then add the offset * the element size. That'd be much slower, and need more memory for the size_type member.
Certainly, it's a good compromise sometimes and it would be nice to be able to choose it when wanted, but it's slower and bulkier than (C-style) direct array usage. std::vector was ruthlessly scrutinised for performance when the STL was being introduced, and the normal implementation is optimised for space and speed over this convenience/reliability factor, just as the array-equivalent operator[] is as fast as arrays but less safe than at().

You can add safety by wrapping the standard std::vector<T>::iterator, but you can't add speed by wrapping a extension::vector<T>::safe_iterator. That's a general principle, and explains many C++ design choices.

There are many reasons for these decisions. As others pointed out, the most basic implementation of iterator for a vector is a plain pointer to the element. To be able to handle push_back iterators would have to be modified to handle a pointer into the vector and a position, on access through the operator, the vector pointer would ave to be dereferenced, the pointer to the data obtained and the position added, with an extra dereference.
While that would not be the most efficient implementation, that is not really a limiting factor. The default implementation of iterators in VS/Dinkumware libraries (even in release) are checked iterators, that manage an equivalent amount of information.
The actual problem comes with other mutating operations. Consider inserting/erasing in the middle of the vector. To maintain validity of all iterators, the container would have to track all the instances of iterators and adapt the position field so that they still refer to the same element (that has been displaced by the insertion/removal).

You would need to store both the offset and a pointer to the vector object itself.
As specified, the iterator can just be a pointer, which takes less space.

TL;DR -- because you're trading simple rules for invalidation for far more complicated action-at-a-distance ones.
Please note that "store a pointer to the vector object" would cause new invalidation cases. For example, today swap preserves iterator validity, if a pointer (or reference) to the vector is stored inside iterators, it no longer could. All operations that move the vector metadata itself (vector-of-vectors anyone?) would invalidate iterators.
You trade is "iterator becomes invalid when a pointer/reference to the element is invalidated" for "iterator becomes invalid when a pointer/reference to the vector is invalidated".
The performance arguments don't much matter, because the proposed alternate implementation is not even correct.

I an iterator wasn't invalidated, should it point to the same element or to the same position after an insertion before it? In other words, even if there were no performance issues, it is non-trivial to decide which alternative definition to use.

Related

What is std::contiguous_iterator useful for?

For what purposes I can use it?
Why is it better than random_access_iterator?
Is there some advantage if I use it?
For a contiguous iterator you can get a pointer to the element the iterator is "pointing" to, and use it like a pointer to a contiguous array.
That can't be guaranteed with a random access iterator.
Remember that e.g. std::deque is a random-access container, but it's typically not a contiguous container (as opposed to std::vector which is both random access and contiguous).
In C++17, there is no such thing as a std::contiguous_iterator. There is the ContiguousIterator named requirement however. This represents a random access iterator over a sequence of elements where each element is stored contiguously, in exactly the same way as an array. Which means that it is possible, given a pointer to an value_type from an iterator, to perform pointer arithmetic on that pointer, which shall work in exactly the same way as performing the same arithmetic on the corresponding iterators.
The purpose of this is to allow for more efficient implementations of algorithms on iterators that are contiguous. Or to forbid algorithms from being used on iterators that aren't contiguous. One example of where this matters is if you're trying to pass C++ iterators into a C interface which is based on pointers to arrays. You can wrap such interfaces behind generic algorithms, verifying the contiguity of the iterator in the template.
Or at least, you could in theory; in C++17, that wasn't really possible.. The reason being that there was not actually a way to test if an iterator was a ContiguousIterator. There's no way to ask a pointer if doing pointer arithmetic on a pointer to an element from the iterator is legal. And there was no std::contiguous_iterator_category one could use for such iterators (as this could cause compatibility problems). So you couldn't use SFINAE tools to verify that an iterator was contiguous.
C++20's std::contiguous_iterator concept resolves this problem. It also resolves the other problem with contiguous iterators. See, the above explanation for ContiguousIterator's behavior starts with us having a pointer to an element from the range. Well, how did you get that? The obvious method would be to do something like std::addressof(*it), but what if it is the end iterator? The end iterator is not dereference-able, so you can't do that. Basically, even if you know that an iterator is contiguous, how do you go about converting it to the equivalent pointer?
The std::contiguous_iterator concept solves both of these problems. std::to_address is available, which will convert any contiguous iterator into its equivalent pointer value. And there is a traits tag that an iterator must provide to denote that it is in fact a contiguous iterator, just in case the default to_address implementation happens to be valid for a non-contiguous iterator.
A random access iterator only requires a constant time (iterator) + (offset), whereas contiguous iterators have the stronger guarantee that std::addressof(*((iterator) + (offset))) == std::addressof(*(iterator)) + (offset) (disregarding overloaded operator&s).
This basically means that the iterator is a pointer or a light wrapper around a pointer, so it is equivalent to a pointer to its elements, whereas a random access iterator can do more, at the cost of possibly being bulkier and being unable to turn it into a simple pointer.
As a C++20 Concept, I would expect you can use it to specify a different algorithm if the container is contiguous. Perhaps exploiting cache locality.

How valid positions in vector::insert()? [duplicate]

This question is related with item 16 of effective stl book which states that while using vector(lets assume vector<int>vec) instead of array in a legacy code we must use &vec[0] instead of vec.begin() :
void doSomething(const int* pInts, size_t numlnts);
dosomething(&vec[0],vec.size()); \\correct!!
dosomething(vec.begin(),vec.size()); \\ wrong!! why???
The book states that vec.begin() is not same as &vec[0] . Why ? What the difference between the two ?
A std::vector is sequence container that encapsulates dynamic size arrays. This lets you conveniently store a bunch of elements without needing to be as concerned with managing the underlying array that is the storage for your elements. A large part of the convenience of using these classes comes from the fact that they give you a bunch of methods that let you deal with the sequence without needing to deal with raw pointers, an iterator is an example of this.
&vec[0] is a pointer to the first element of the underlying storage that the vector is using. vec.begin() is an iterator that starts at the beginning of the vector. While both of these give you a way to access the elements in the sequence these are 2 distinct concepts. Search up iterators to get a better idea of how this works.
If your code supports iterators its often easiest to use the iterators to iterate over the data. Part of the reasons for this is that iterators are not pointers, they let you iterate over the elements of the data structure without needing to know as much about the implementation details of the datastructure you are iterating over.
However sometimes you need the raw array of items, for example in some legacy API's or calls to C code you might need to pass a pointer to the array. In this case you have no choice but to extract the raw array from the vector, you can do this using something such as &vec[0]. Note that if you have c++11 support there's an explicit way to do this with std::vector::data which will give you access to the underlying storage array. The c++11 way has the additional benefit of also more clearly stating your intent to the people reading your code.
Formally, one produces an iterator, and the other a pointer, but I think the major difference is that vec[0] will do bad stuff if the vector is empty, while vec.begin() will not.
vec.begin() has type std::vector<int>::iterator. &vec[0] has type pointer to std::vector<int>::value_type. These are not necessarily the same type.
It is possible that a given implementation uses pointers as the iterator implementation for a vector, but this is not guaranteed, and thus you should not rely on that assumption. In fact most implementations do provide a wrapping iterator type.
Regarding your question about pointers being iterators, this is partly true. Pointers do meet the criteria of a random access iterator, but not all iterators are pointers. And there are iterators that do not support the random access behavior of pointers.

When the data structure is a template parameter, how can I tell if an operation will invalidate an iterator?

Specifically, I have a class which currently uses a vector and push_back. There is an element inside the vector I want to keep track of. Pushing back on the vector may invalidate the iterator, so I keep its index around. It's cheap to find the iterator again using the index. I can't reserve the vector as I don't know how many items will be inserted.
I've considered making the data structure a template parameter, and perhaps list may be used instead. In that case, finding an iterator from the index is not a trivial operation. Since pushing back on a list doesn't invalidate iterators to existing elements, I could just store this iterator instead.
But how can I code a generic class which handles both cases easily?
If I can find out whether push_back will invalidate the iterator, I could store the iterator and update it after each push_back by storing the distance from the beginning before the operation.
You should probably try to avoid this flexibility. Quote from Item 2 "Beware the illusion of container-independent code" from Effective STL by Scott Meyers:
Face the truth: it's not worth it. The different containers are
different, and they have strengths and weaknesses that vary in significant ways. They're not designed to be interchangeable, and
there's littel you can do to paper that over. If you try, you're
merely tempting fate, and fate doesn't like to be tempted.
If you really, positively, definitely have to maintain valid iterators, use std::list. If you also need to have random access, try Boost.MultiIndex (although you'll lose contiguous memory access).
If you look at the standard container adapators (std::stack, std::queue) you see that they support the intersection of the adaptable containers interfaces, not their union.
I'd create a second class, which responsibility would be to return the iterator you are interested in.
It should also be parametrized with the same template parameter, and then you can specialize it for any type you want (vector/list etc). So inside your specializations you can use any method you want.
So it's some traits-based solution.
If you really want to stick with the vector and have that functionality maybe take a look at
http://en.cppreference.com/w/cpp/container/vector/capacity function. wrap your push_backs in defined function or even better wrap whole std::vector in ur class and before push_backing compare capacity against size() to check if resize will happen.

Iterating through a sequence while modifying it. Use vector or List ? C++/ STL

Suppose I have a long sequence of unordered elements S s1, s2, s3,.... of a arbitrary but fixed data type through which I wish to iterate and delete certain elements according to some boolean criterion.
Now if after iterating through the sequence if I am not interested in the final ordering of the sequence then I can store my sequence in 2 ways
Use a plain ol' std::list to represent the sequence. Perform removal with the std::list methods.
Use a std::vector to represent the sequence. If a certain element fails the criterion and has to be deleted swap it with the last vector element and perform a pop_back.
My questions are
1.Which would be a better/efficient way timewise and/or memorywise to store my sequence?
2.If I had to venture a guess, then I would say list, because if si 's data-type memory size is large, swapping would be expensive. Would this reasoning be correct?
In practice, std::vector has a great performance advantage over other containers due to its tight memory locality. If your elements are moreover movable (i.e. inexpensive to swap), then your second option should be your first try. Implement it with the standard remove/erase idiom:
v.erase(std::remove_if(v.begin(), v.end(), predicate), v.end());
You should also set up a second version with a std::list and compare the performance:
l.remove_if(predicate);
The list avoids moving any elements around, so in theory it could be efficient, however the practical effects of memory locality cannot be captured by the language standard and you cannot get around measuring and comparing the actual performance.
(Supposedly, if your element type is huge, like sizeof(T) > 10000, the list will probably start being faster than the vector. Test and compare, and keep your code modular such that changing this later is easy.)
If you have a C++11 compiler, or atleast an rvalue reference aware one, using swap will cost you nothing if your data type isn't flat (i.e., contains pointers to external resources or in other words, is expensive to copy) since it will just move your structs around. So if you have such a compiler, create a move constructor (read up on that) for the data type, and you're set. Just use a std::vector from there on.
Now if your structs are flat (no external resources), and are large, you might really want to use a std::list, since the memory overhead would be reasonably small in comparision to your data type's size. Since you only seem to be interested in bidirectional/sequential access to the elements, this might be just the right place to use a list.
A last point, and an important factor, measure. The default container to reach for should always be std::vector. Measure how both perform before blindly deciding on one. Another important factor is if you actually need to do anything else with the containers, like random-access or such stuff.
Edit: Before I forget, you might also just want to create a view over the container holding your data, which might be very cheap.
We can only guess. I'd say that if the objects are easily copiable (e.g. basic types) then std::vector will be more efficient, as removing elements will not alloc/realloc/free any memory. But if the cost of copying elements is significant, then the std::list will be better.
But note that with C++11, the copy will be converted into a move, so you should consider the moving cost, that will be presumably quite less than the copy.
In almost all practical cases, use std::vector initially. As always, write your code first then optimise later, if and when it is needed. If your profiler indicates that vector's inefficiencies are the cause, then try a list. I've almost never seen a performance benefit from it though.

Should I return an iterator or a pointer to an element in a STL container?

I am developing an engine for porting existing code to a different platform. The existing code has been developed using a third party API, and my engine will redefine those third party API functions in terms of my new platform.
The following definitions come from the API:
typedef unsigned long shape_handle;
shape_handle make_new_shape( int type );
I need to redefine make_new_shape and I have the option to redefine shape_handle.
I have defined this structure ( simplified ):
struct Shape
{
int type
};
The Caller of make_new_shape doesn't care about the underlying structure of Shape, it just needs a "handle" to it so that it can call functions like:
void `set_shape_color( myshape, RED );`
where myshape is the handle to the shape.
My engine will manage the memory for the Shape objects and other requirements dictate that the engine should be storing Shape objects in a list or other iterable container.
My question is, what is the safest way to represent this handle - if the Shape itself is going to be stored in a std::list - an iterator, a pointer, an index?
Both an iterators or a pointers will do bad stuff if you try to access them after the object has been deleted so neither is intrinsically safer. The advantage of an iterator is that it can be used to access other members of your collection.
So, if you just want to access your Shape then a pointer will be simplest. If you want to iterate through your list then use an iterator.
An index is useless in a list since std::list does not overload the [] operator.
The answer depends on your representation:
for std::list, use an iterator (not a pointer), because an iterator allows you to remove the element without walking the whole list.
for std::map or boost::unordered_map, use the Key (of course)
Your design would be much strong if you used an associative container, because associative containers give you the ability to query for the presence of the object, rather than invoking Undefined Behavior.
Try benchmarking both map and unordered_map to see which one is faster in your case :)
IIF the internal representation will be a list of Shapes, then pointers and iterators are safe. Once an element is allocated, no relocation will ever occur. I wouldn't recommend an index for obvious access performance reasons. O(n) in case of lists.
If you were using a vector, then don't use iterators or pointers, because elements can be relocated when you exceed the vectors capacity, and your pointers/iterators would become invalid.
If you want a representation that is safe regardless of the internal container, then create a container (list/vector) of pointers to your shapes, and return the shape pointer to your client. Even if the container is moved around in memory, the Shape objects will stay in the same location.
Iterators aren't safer than pointers, but they have much better diagnostics than raw pointers if you're using a checked STL implementation!
For example, in a debug build, if you return a pointer to a list element, then erase that list element, you have a dangling pointer. If you access it you get a crash and all you can see is junk data. That can make it difficult to work out what went wrong.
If you use an iterator and you have a checked STL implementation, as soon as you access the iterator to an erased element, you get a message something like "iterator was invalidated". That's because you erased the element it points to. Boom, you just saved yourself potentially a whole lot of debugging effort.
So, not indices for O(n) performance. Between pointers and iterators - always iterators!