How valid positions in vector::insert()? [duplicate] - c++

This question is related with item 16 of effective stl book which states that while using vector(lets assume vector<int>vec) instead of array in a legacy code we must use &vec[0] instead of vec.begin() :
void doSomething(const int* pInts, size_t numlnts);
dosomething(&vec[0],vec.size()); \\correct!!
dosomething(vec.begin(),vec.size()); \\ wrong!! why???
The book states that vec.begin() is not same as &vec[0] . Why ? What the difference between the two ?

A std::vector is sequence container that encapsulates dynamic size arrays. This lets you conveniently store a bunch of elements without needing to be as concerned with managing the underlying array that is the storage for your elements. A large part of the convenience of using these classes comes from the fact that they give you a bunch of methods that let you deal with the sequence without needing to deal with raw pointers, an iterator is an example of this.
&vec[0] is a pointer to the first element of the underlying storage that the vector is using. vec.begin() is an iterator that starts at the beginning of the vector. While both of these give you a way to access the elements in the sequence these are 2 distinct concepts. Search up iterators to get a better idea of how this works.
If your code supports iterators its often easiest to use the iterators to iterate over the data. Part of the reasons for this is that iterators are not pointers, they let you iterate over the elements of the data structure without needing to know as much about the implementation details of the datastructure you are iterating over.
However sometimes you need the raw array of items, for example in some legacy API's or calls to C code you might need to pass a pointer to the array. In this case you have no choice but to extract the raw array from the vector, you can do this using something such as &vec[0]. Note that if you have c++11 support there's an explicit way to do this with std::vector::data which will give you access to the underlying storage array. The c++11 way has the additional benefit of also more clearly stating your intent to the people reading your code.

Formally, one produces an iterator, and the other a pointer, but I think the major difference is that vec[0] will do bad stuff if the vector is empty, while vec.begin() will not.

vec.begin() has type std::vector<int>::iterator. &vec[0] has type pointer to std::vector<int>::value_type. These are not necessarily the same type.
It is possible that a given implementation uses pointers as the iterator implementation for a vector, but this is not guaranteed, and thus you should not rely on that assumption. In fact most implementations do provide a wrapping iterator type.
Regarding your question about pointers being iterators, this is partly true. Pointers do meet the criteria of a random access iterator, but not all iterators are pointers. And there are iterators that do not support the random access behavior of pointers.

Related

What is std::contiguous_iterator useful for?

For what purposes I can use it?
Why is it better than random_access_iterator?
Is there some advantage if I use it?
For a contiguous iterator you can get a pointer to the element the iterator is "pointing" to, and use it like a pointer to a contiguous array.
That can't be guaranteed with a random access iterator.
Remember that e.g. std::deque is a random-access container, but it's typically not a contiguous container (as opposed to std::vector which is both random access and contiguous).
In C++17, there is no such thing as a std::contiguous_iterator. There is the ContiguousIterator named requirement however. This represents a random access iterator over a sequence of elements where each element is stored contiguously, in exactly the same way as an array. Which means that it is possible, given a pointer to an value_type from an iterator, to perform pointer arithmetic on that pointer, which shall work in exactly the same way as performing the same arithmetic on the corresponding iterators.
The purpose of this is to allow for more efficient implementations of algorithms on iterators that are contiguous. Or to forbid algorithms from being used on iterators that aren't contiguous. One example of where this matters is if you're trying to pass C++ iterators into a C interface which is based on pointers to arrays. You can wrap such interfaces behind generic algorithms, verifying the contiguity of the iterator in the template.
Or at least, you could in theory; in C++17, that wasn't really possible.. The reason being that there was not actually a way to test if an iterator was a ContiguousIterator. There's no way to ask a pointer if doing pointer arithmetic on a pointer to an element from the iterator is legal. And there was no std::contiguous_iterator_category one could use for such iterators (as this could cause compatibility problems). So you couldn't use SFINAE tools to verify that an iterator was contiguous.
C++20's std::contiguous_iterator concept resolves this problem. It also resolves the other problem with contiguous iterators. See, the above explanation for ContiguousIterator's behavior starts with us having a pointer to an element from the range. Well, how did you get that? The obvious method would be to do something like std::addressof(*it), but what if it is the end iterator? The end iterator is not dereference-able, so you can't do that. Basically, even if you know that an iterator is contiguous, how do you go about converting it to the equivalent pointer?
The std::contiguous_iterator concept solves both of these problems. std::to_address is available, which will convert any contiguous iterator into its equivalent pointer value. And there is a traits tag that an iterator must provide to denote that it is in fact a contiguous iterator, just in case the default to_address implementation happens to be valid for a non-contiguous iterator.
A random access iterator only requires a constant time (iterator) + (offset), whereas contiguous iterators have the stronger guarantee that std::addressof(*((iterator) + (offset))) == std::addressof(*(iterator)) + (offset) (disregarding overloaded operator&s).
This basically means that the iterator is a pointer or a light wrapper around a pointer, so it is equivalent to a pointer to its elements, whereas a random access iterator can do more, at the cost of possibly being bulkier and being unable to turn it into a simple pointer.
As a C++20 Concept, I would expect you can use it to specify a different algorithm if the container is contiguous. Perhaps exploiting cache locality.

Why is vector::iterator invalidated upon reallocation?

I don't understand why a vector's iterator should be invalidated when a reallocation happens.
Couldn't this have been prevented simply by storing an offset -- instead of a pointer -- in the iterator?
Why was vector not designed this way?
Just to add a citation to the performance-related justification: when designing C++, Stroustrup thought it was vital that template classes like std::vector approach the performance characteristics of native arrays:
One reason for the emphasis on run-time efficiency...was that I wanted
templates to be efficient enough in time and space to be used for
low-level types such as arrays and lists.
...
Higher-level alternatives -- say, a range-checked array with a size()
operation, a multidimensional array, a vector type with proper numeric
vector operations and copy semantics, etc. -- would be accepted by
users only if their run-time, space, and notational convenience
approached those of built-in arrays.
In other words, the language mechanism supplying parameterized types
should be such that a concerned user should be able to afford to
eliminate the use of arrays in favor of a standard library class.
Bjarne Stroustrup, Design and Evolution of C++, p.342.
Because for iterators to do that, they'd need to store a pointer to the vector object. For each data access, they'd need to follow the pointer to the vector, then follow the pointer therein to the current location of the data array, then add the offset * the element size. That'd be much slower, and need more memory for the size_type member.
Certainly, it's a good compromise sometimes and it would be nice to be able to choose it when wanted, but it's slower and bulkier than (C-style) direct array usage. std::vector was ruthlessly scrutinised for performance when the STL was being introduced, and the normal implementation is optimised for space and speed over this convenience/reliability factor, just as the array-equivalent operator[] is as fast as arrays but less safe than at().
You can add safety by wrapping the standard std::vector<T>::iterator, but you can't add speed by wrapping a extension::vector<T>::safe_iterator. That's a general principle, and explains many C++ design choices.
There are many reasons for these decisions. As others pointed out, the most basic implementation of iterator for a vector is a plain pointer to the element. To be able to handle push_back iterators would have to be modified to handle a pointer into the vector and a position, on access through the operator, the vector pointer would ave to be dereferenced, the pointer to the data obtained and the position added, with an extra dereference.
While that would not be the most efficient implementation, that is not really a limiting factor. The default implementation of iterators in VS/Dinkumware libraries (even in release) are checked iterators, that manage an equivalent amount of information.
The actual problem comes with other mutating operations. Consider inserting/erasing in the middle of the vector. To maintain validity of all iterators, the container would have to track all the instances of iterators and adapt the position field so that they still refer to the same element (that has been displaced by the insertion/removal).
You would need to store both the offset and a pointer to the vector object itself.
As specified, the iterator can just be a pointer, which takes less space.
TL;DR -- because you're trading simple rules for invalidation for far more complicated action-at-a-distance ones.
Please note that "store a pointer to the vector object" would cause new invalidation cases. For example, today swap preserves iterator validity, if a pointer (or reference) to the vector is stored inside iterators, it no longer could. All operations that move the vector metadata itself (vector-of-vectors anyone?) would invalidate iterators.
You trade is "iterator becomes invalid when a pointer/reference to the element is invalidated" for "iterator becomes invalid when a pointer/reference to the vector is invalidated".
The performance arguments don't much matter, because the proposed alternate implementation is not even correct.
I an iterator wasn't invalidated, should it point to the same element or to the same position after an insertion before it? In other words, even if there were no performance issues, it is non-trivial to decide which alternative definition to use.

Sharing an array with STL vectors

I would like to share the contents of an array of doubles a of size k with one or more STL vectors v1, v2...vn.
The effect that I want from this shared storage is that if the underlying array gets modified the change can be observed from all the vectors that share its contents with the array.
I can do that by defining the vectors v1...vn as vectors of pointers
vector<double*> v1;
and copy the pointers a to a + k into this vector. However, I do not like that solution. I want the vectors to be a vector of doubles.
Given that you can extract the underlying pointer from a vector I am assuming one could initialize a vector with an array in such a way that the contents are shared. Would appreciate help about how to do this.
Given that you can extract the underlying pointer from a vector I am assuming one could initialize a vector with an array in such a way that the contents are shared.
No, you can't do this. The Standard Library containers always manage their own memory.
Your best option is to create the std::vector<double> and then use it as an array where you need to do so (via &v[0], assuming the vector is non-empty).
If you just want to have the container interface, consider using std::array (or boost::array or std::tr1::array) or writing your own container interface to encapsulate the array.
This sounds to me like you want to alias the array with a vector. So logically you want a vector of references (which doesn't work for syntactical reasons). If you really really need this feature, you can write your own ref wrapper class, that behaves exactly like an actual C++ reference, so the users of your vn vectors wont be able to distinguish between vector<T> and vector<ref<T> > (e.g. with T = double). But internally, you could link the items in the vectors to the items in your "master" array.
But you should have darned good reasons to do this overhead circus :)
OK, Standard Library containers are both holders of information, and enumerators for those elements. That is, roughly any container can be used in almost any algorithm, and at least, you can go through them using begin() and end().
When you separate both (element holding and element enumeration), as in your case, you may consider boost.range. boost.range gives you a pair of iterators that delimit the extent to which algorithms will be applied, and you have the actual memory store in your array. This works mostly to read-access them, because normally, modifying the structure of the vector will invalidate the iterators. You can recreate them, though.
To answer your question, as far as I know std::vector can not be given an already constructed array to use. I can not even think how that could be done since there are also the size/capacity related variables. You can possibly try to hack a way to do it using a custom allocator but I feel it will be ugly, error prone and not intuitive for future maintenance.
That said, if I may rephrase your words a bit, you are asking for multiple references to the same std::vector. I would either do just that or maybe consider using a shared_ptr to a vector.

Should I return an iterator or a pointer to an element in a STL container?

I am developing an engine for porting existing code to a different platform. The existing code has been developed using a third party API, and my engine will redefine those third party API functions in terms of my new platform.
The following definitions come from the API:
typedef unsigned long shape_handle;
shape_handle make_new_shape( int type );
I need to redefine make_new_shape and I have the option to redefine shape_handle.
I have defined this structure ( simplified ):
struct Shape
{
int type
};
The Caller of make_new_shape doesn't care about the underlying structure of Shape, it just needs a "handle" to it so that it can call functions like:
void `set_shape_color( myshape, RED );`
where myshape is the handle to the shape.
My engine will manage the memory for the Shape objects and other requirements dictate that the engine should be storing Shape objects in a list or other iterable container.
My question is, what is the safest way to represent this handle - if the Shape itself is going to be stored in a std::list - an iterator, a pointer, an index?
Both an iterators or a pointers will do bad stuff if you try to access them after the object has been deleted so neither is intrinsically safer. The advantage of an iterator is that it can be used to access other members of your collection.
So, if you just want to access your Shape then a pointer will be simplest. If you want to iterate through your list then use an iterator.
An index is useless in a list since std::list does not overload the [] operator.
The answer depends on your representation:
for std::list, use an iterator (not a pointer), because an iterator allows you to remove the element without walking the whole list.
for std::map or boost::unordered_map, use the Key (of course)
Your design would be much strong if you used an associative container, because associative containers give you the ability to query for the presence of the object, rather than invoking Undefined Behavior.
Try benchmarking both map and unordered_map to see which one is faster in your case :)
IIF the internal representation will be a list of Shapes, then pointers and iterators are safe. Once an element is allocated, no relocation will ever occur. I wouldn't recommend an index for obvious access performance reasons. O(n) in case of lists.
If you were using a vector, then don't use iterators or pointers, because elements can be relocated when you exceed the vectors capacity, and your pointers/iterators would become invalid.
If you want a representation that is safe regardless of the internal container, then create a container (list/vector) of pointers to your shapes, and return the shape pointer to your client. Even if the container is moved around in memory, the Shape objects will stay in the same location.
Iterators aren't safer than pointers, but they have much better diagnostics than raw pointers if you're using a checked STL implementation!
For example, in a debug build, if you return a pointer to a list element, then erase that list element, you have a dangling pointer. If you access it you get a crash and all you can see is junk data. That can make it difficult to work out what went wrong.
If you use an iterator and you have a checked STL implementation, as soon as you access the iterator to an erased element, you get a message something like "iterator was invalidated". That's because you erased the element it points to. Boom, you just saved yourself potentially a whole lot of debugging effort.
So, not indices for O(n) performance. Between pointers and iterators - always iterators!

Use of iterators over array indices

I just wanted to know what is the main advantage of using the iterators over the array indices. I have googled but i am not getting the right answer.
I presume you are talking about when using a vector, right?
The main advantage is that iterator code works for all stl containers, while the array indexing operator [] is only available for vectors and deques. This means you are free to change the underlying container if you need to without having to recode every loop. It also means you can put your iteration code in a template and it will work for any container, not just for deques and vectors (and arrays of course).
All of the standard containers provide the iterator concept. An iterator knows how to find the next element in the container, especially when the underlying structure isn't array-like. Array-style operator[] isn't provided by every container, so getting in the habit of using iterators will make more consistent-looking code, regardless of the container you choose.
You can abstract the collection implementation away.
To expand upon previous answers:
Writing a loop with operator[] constrains you to a container that supports [] and uses the same index/size type. Otherwise you'd need to rewrite every loop to change the container.
Even if your container supports [], it may not be optimal for sequential traversing. [] is basically a random-access operator, which is O(1) for vector but could be as bad as O(n) depending on the underlying container.
This is a minor point, but if you use iterators, your loop could be more easily moved to using the standard algorithms, e.g. std::for_each.
There are many data structures, e.g. hash tables and linked lists cannot be naturally or quickly indexed, but they are indeed traversable. Iterators act as an interface that let you walk on any data structure without knowing the actual implementation of the source.
The STL contains algorithms, such as transform and for_each that operate on containers. They don't accept indices, but use iterators.
Iterators help hide the container implementation and allow the programmer to focus more on the algorithm. The for_each function can be applied to anything that supports a forward iterator.
As well as the points in other answers, iterators can also be faster (specifically compared to operator[]), since they are essentially iteration by pointer. If you do something like:
for (int i = 0; i < 10; ++i)
{
my_vector[i].DoSomething();
}
Every iteration of the loop unnecessarily calculates my_vector.begin() + i. If you use iterators, incrementing the iterator means it's already pointing to the next element, so you don't need that extra calculation. It's a small thing, but can make a difference in tight loops.
One other slight difference is that you can't use erase() on an element in a vector by index, you must have an iterator. No big deal since you can always do "vect.begin() + index" as your iterator, but there are other considerations. For example, if you do this then you must always check your index against size() and not some variable you assigned that value.
None of that is really too much worth worrying about but if given the choice I prefer iterator access for the reasons already stated and this one.
I would say it's more a matter of consistency and code reuse.
Consistency in that you will use all other containers with iterators
Code reuse in that algorithms written for iterators cannot be used with the subscript operator and vice versa... and the STL has lots of algorithms so you definitely want to build on it.
Finally, I'd like to say that even C arrays have iterators.
const Foo* someArray = //...
const Foo* otherArray = new Foo[someArrayLength];
std::copy(someArray, someArray + someArrayLength, otherArray);
The iterator_traits class has been specialized so that pointers or a model of RandomAccessIterator.