Not using iterators into a resized vectors - c++

I read in The C++ Programming Language : Special Edition
Don't use iterators into a resized vector
Consider this example.
vector< int >::iterator it = foo.begin();
while ( it != foo.end() ) {
if ( // something ) {
foo.push_back( // some num );
}
++it;
}
Is there a problem with this? After the vector was resized, would the foo.end() in the loop condition be pushed forward 1?
P.S. In addition, what if vector had reserved space for x number of ints. If push_back didn't violate this space, would it still be an issue ( I would assume so if it.end() points to one past the last element in the vector that contains something ).

Yes, there is a problem with it.
Any call to push_back has the potential to invalidate all iterators into a vector.
foo.end() will always retrieve the valid end iterator (which may be different to the value last returned by foo.end()), but it may have been invalidated. This means that incrementing it or comparing it may caused undefined behaviour.

Yes, there's a problem. Regardless of foo.end(), it may be invalidated by the push_back(). Edit: (i.e. it's not just that the end could change; it's possible that the whole buffer for the vector may be reallocated, so all iterators become invalid).

Yes, there's a problem with that. push_back invalidates any iterators for the vector you called it on. So after calling push_back, it is not even legal to execute ++it. This is a case of undefined behavior, so it may sometimes work and it may sometimes fail but you should never rely on it working.

As others have said, the push_back() may invalidate all iterators to the vector. The reason for this is that the data in a vector is stored in a region of contiguous memory. If the push_back() or any other operation that resizes the vector causes the size of the vector to go over the capacity of the allocated region that region will be reallocated and end up at a different place in memory, while all the iterators will still be referencing the old memory region.

Related

invalid iterator with random access iterators with deque

I am reading effective STL by Scott Meyers. Here in item 1 author is mentioning about how to choose among various containers and below is text snippet which I am having difficulty in understanding.
Would it be helpful to have a sequence container with random access
iterators where pointers and references to the data are not
invalidated as long as nothing is erased and insertions take place
only at the ends of the container? This is a very special case, but if
it’s your case, deque is the container of your dreams.
(Interestingly,deque’s iterators may be invalidated when insertions
are made only at the ends of the container. deque is the only standard
STL container whose iterators may be invalidated without also
invalidating its pointers and references.)
My questions on above text
What does author mean by pointers and references in above context and how is it different from iterators?
How deque's iterators may be invalidated when insertion made only at end and still we have valid pointers and references?
Request above two questions to be answered with simple example.
Thanks for your time and help.
For the first part, what's meant is this:
deque<int> foo(10, 1); // a deque with ten elements with value of 1
int& bar = foo.front(); // reference
int* baz = &foo.front(); // pointer
deque<int>::iterator buz = foo.begin(); // iterator
deque.push_front(0);
// At this point bar and baz are still valid, but buz may have been invalidated
For the second part it's been covered in the detail here:
Why does push_back or push_front invalidate a deque's iterators?
An iterator is often used to "cycle through" the elements of a standard-library container, much like you would do with an array index, e.g. in a for loop.
Iterators can be invalid for many reasons. One common case where this happens is when you use a for loop such as the following:
std::deque<int> c;
for(std::deque<int>::iterator i = c.begin(); i != c.end(); ++i) {
// do some stuff to the deque's elements here
}
At the end of the above loop, the iterator i will point to an "element" one block after the last real element in the deque. If you tried to do something like
*i = 88;
right after the end of the above for loop that would be a problem because the container does not "own" the memory i "points" to.
But what Meyers is likely talking about is that the Standard leaves much of the implementation of a deque open to the designer. Deques are usually implemented as linked-lists of blocks of memory holding several elements, so unlike vectors there is no guarantee that elements will be next to each other in memory. Furthermore, iterators necessarily contain information about these "blocks" so that they can traverse them smoothly (i.e. iterators are not simply pointers).
For example, if I push_back() a new element, but there is no more room in the "last" chunk of memory, then deque will need to allocate a new block of memory for the new element (and future elements added to the end). Since an iterator I was using previously might not "know" about this new chunk of memory, it could be invalid.
References and actual pointers, on the other hand, would be used in this context to refer/point to individual objects in the container. If I write
int& j = *c.begin();
then j is a reference to the first element of c. If I then do
c.push_front(74);
j still references that previous first element, even though it is no longer at the front of the deque.
However, if you insert something in the middle of the deque, then chances are you are effectively splitting one of those contiguous chunks of memory and trying to squeeze your new element in there. To make room, elements on one side or the other must be shuffled around in memory (and possibly new memory needs to be allocated). This would by necessity invalidate pointers/references to elements on that "side" of the insertion. Since it is up to the implementer how exactly room is made for an inserted element, all bets are off with respect to any pointer/reference, no matter where it is with respect to the insertion.

Seg fault with iterators and vectors

I seem to be having a problem displaying an item in a vector with an iterator. Possibly, I just need another set of eyes to look at it.
vector<string> tempVector;
vector<string>::iterator it;
it = tempVector.begin();
tempVector.push_back("1");
cout << *it;
I know this isn't the full code, but it's the only portion running. The output is a segfault. doesn't the iterator point to the beginning of the vector? I was expecting to get "1" to cout.
The call to vector::reserve() invalidates all existing iterators if it happens to require reallocation.
To quote the C++ standard, 23.3.6.3[vector.capacity]
Reallocation happens at this point if and only if the current capacity is less than the argument of reserve().
[...]
Reallocation invalidates all the references, pointers, and iterators referring to the elements in the sequence.
EDIT: After the edit, you have a call to vector::push_back(), which also invalidates all iterators if it requires reallocation. Iterator invalidation rules may be helpful.

vector pointer locations guaranteed?

Suppose I have a vector of ints,
std::vector<int> numbers;
that is populated with a bunch of values, then I say do this (where an entry exists at 43)
int *oneNumber = &numbers[43];
Is oneNumber guaranteed to always be pointing at the int at index 43, even if say I resize numbers to something like numbers.resize(46)?
I'm not 100% sure what expected behaviour is here, I know vectors are guaranteed to be contiguous but not sure if that continuity would also mean all the indices in the vector will remain in the same place throughout its life.
Is oneNumber guaranteed to always be pointing at the int at index 43
Yes, this is guaranteed by the standard.
even if say I resize numbers to something like numbers.resize(46)?
No. Once you resize, add, or remove anything to the vector, all addresses and iterators to it are invalidated. This is because the vector may need to be reallocated with new memory locations.
Your paranoia is correct. Resizing a std::vector can cause its memory location to change. That means your oneNumber is now pointing to an old memory location that has been freed, and so accessing it is undefined behavior.
Pointers, references, and iterators to std::vector elements are guaranteed to stay put as long as you only append to the std::vector and the size of the std::vector doesn't grow beyond its capacity() at the time the pointer, reference, or iterator was obtained. Once it gets resized beyond the capacity() all pointers, references, and iterators to this std::vector become invalidated. Note that things are invalidated as well when inserting somewhere else than the end of the std::vector.
If you want to have your objects stay put and you only insert new elements at the end or the beginning, you can use std::deque. Pointers and references to elements in the std::deque get only invalidated when you insert into the middle of the std::deque or when removing from the middle or when removing the referenced object. Note that iterators to elements in the std::deque get invalidated every time you insert an element into the std::deque or remove any element from it.
As all the others have said, when you call .resize() on a vector your pointers become invalidated because the (old array) may be completely deallocated, and an entirely new one may be re-allocated and your data copied into it.
One workaround for this is don't store pointers into an STL vector. Instead, store integer indices.
So in your example,
std::vector<int> numbers;
int *oneNumber = &numbers[43]; // no. pointers invalidated after .resize or possibly .push_back.
int oneNumberIndex = 43 ; // yes. indices remain valid through .resize/.push_back
No - the vector can be reallocated when it grows. Usually once the vector doubles in size.
From the C++11 standard
1 Remarks: Causes reallocation if the new size is greater than the old capacity. If no
reallocation happens, all the iterators and references before the insertion point
remain valid. If an exception is thrown other than by the copy constructor, move
constructor, assignment operator, or move assignment operator of T or by any
InputIterator operation there are no effects. If an exception is thrown by the move
constructor of a non-CopyInsertable T, the effects are unspecified.
When you use a vector's resize() or reserve() function to increase the capacity of the vector, it may need to reallocate memory for the array-backing. If it does reallocate, the new memory will not be located at the same address, so the address stored in oneNumber will no longer point to the right place.
Again, this depends on how many elements the vector is currently being used to store and the requested size. Depending on the specifics, the vector may be able to resize without reallocating, but you should definitely not assume that this will be the case.
Once you changed the capacity of the vector, the data was copied to another memory block, and the origin data is deleted.

C++ and iterator invalidation

So I'm going through Accelerated C++ and am somewhat unsure about iterator invalidation in C++. Maybe it's the fact that it is never explained how these iterators are constructed is the problem.
Here is one example:
Vector with {1,2,3}
If my iterator is on {2} and I call an erase on {2} my iterator is invalid. Why? In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element. The only way I would see this as being not true is if iterators were made before hand for each element and each iterator had some type of field containing the address of the following element in that container.
My other question has to do with the statement such as "invalidates all other iterators". Erm, when I loop through my vector container, I am using one iterator. Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element.
That may be the case. But it’s equally valid that the whole vector is relocated in memory, thus making all iterators point to now-defunct memory locations. C++ simply makes no guarantees either way. (See comments for discussion.)
Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
You’re merely missing the fact that you may have other iterators referencing the same vector besides your loop variable. For example, the following loop is an idiomatic style that caches the end iterator of the vector to avoid redundant calls:
vector<int> vec;
// …
for (vector<int>::iterator i(vec.begin()), end(vec.end()); i != end; ++i) {
if (some_condition)
vec.erase(i); // invalidates `i` and `end`.
}
(Nevermind the fact that this copy of the end iterator is in fact unnecessary with the STL on modern compilers.)
The following C++ defect report (fixed in C++0x) contains a brief discussion of the meaning of "invalidate":
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#414
int A[8] = { 1,3,5,7,9,8,4,2 };
std::vector<int> v(A, A+8);
std::vector<int>::iterator i1 = v.begin() + 3;
std::vector<int>::iterator i2 = v.begin() + 4;
v.erase(i1);
Which iterators are invalidated by
v.erase(i1): i1, i2, both, or neither?
On all existing implementations that I
know of, the status of i1 and i2 is
the same: both of them will be
iterators that point to some elements
of the vector (albeit not the same
elements they did before). You won't
get a crash if you use them. Depending
on exactly what you mean by
"invalidate", you might say that
neither one has been invalidated
because they still point to something,
or you might say that both have been
invalidated because in both cases the
elements they point to have been
changed out from under the iterator.
It seems that the specification is "playing safe" regarding iterator and reference invalidation. It says that they're invalidated even though, as you and Matt Austern both noted, there's still a vector element at the same address. It just has a different value.
So, those of us following the standard must program as if that iterator can't be used any more, even though no implementation is likely to do anything that would actually stop them working, except perhaps a debugging iterator that could do extra work to let us know we're off-road.
In fact that defect report relates to exactly the case you're talking about. As far as the C++03 standard actually says, at least in that clause, your iterator isn't invalidated. But that was considered an error.
An iterator basically wraps a pointer. Some operations on containers have the effect of reallocating some or all of the data behind the scenes. In that case, all current pointers/iterators are left pointing to the wrong memory locations.
The image "in your mind" is an implementation detail, and it could be that your iterator isn't implemented that way. Likely it is, but it could be that it isn't.
The "ivalidates all other iterators" language is their way of saying that the implemenation is allowed the freedom to do anything its coders' skeevie hearts feel like to the contaier when you perform that operation, including things that require internal changes to iterators. Since the only iterator it has access to is the one you passed in, that's the only one that it can fix up if need be.
If you want the behavior in your head for a vector, it is easy to get. Just use an index into the vector instead of an iterator. Then it works just like you think.
Chances are that your iterator is actually pointing at the 3 -- but it's not certain.
The general idea is to allow vector to allocate new storage and move your data from one block of storage to another when/if it sees fit to do so. As such, when you insert or delete data, the data might move to some other part of memory entirely.
At least that was sort of the intent. It turns out that other rules probably prevent it from moving the data when you delete -- but the iterator is invalidated anyway, probably because somebody didn't quite understand all the implications of those other rules when this one was made.
From SGI http://www.sgi.com/tech/stl/Vector.html
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
So you can erase starting from end
int i;
vector v;
for ( i = v.size(), i >=0, i--)
{
if (v[i])
v.erase(v.begin() + i);
}
OR use iterator returned from vector erase()
std::vector<int> v;
for (std::vector<int>::iterator it = v.begin(); it != v.end(); )
it = v.erase(it);

Does std::vector change its address? How to avoid

Since vector elements are stored contiguously, I guess it may not have the same address after some push_back's , because the initial allocated space could not suffice.
I'm working on a code where I need a reference to an element in a vector, like:
int main(){
vector<int> v;
v.push_back(1);
int *ptr = &v[0];
for(int i=2; i<100; i++)
v.push_back(i);
cout << *ptr << endl; //?
return 0;
}
But it's not necessarily true that ptr contains a reference to v[0], right? How would be a good way to guarantee it?
My first idea would be to use a vector of pointers and dynamic allocation. I'm wondering if there's an easier way to do that?
PS.: Actually I'm using a vector of a class instead of int, but I think the issues are the same.
Don't use reserve to postpone this dangling pointer bug - as someone who got this same problem, shrugged, reserved 1000, then a few months later spent ages trying to figure out some weird memory bug (the vector capacity exceeded 1000), I can tell you this is not a robust solution.
You want to avoid taking the address of elements in a vector if at all possible precisely because of the unpredictable nature of reallocations. If you have to, use iterators instead of raw addresses, since checked STL implementations will tell you when they have become invalid, instead of randomly crashing.
The best solution is to change your container:
You could use std::list - it does not invalidate existing iterators when adding elements, and only the iterator to an erased element is invalidated when erasing
If you're using C++0x, std::vector<std::unique_ptr<T>> is an interesting solution
Alternatively, using pointers and new/delete isn't too bad - just don't forget to delete pointers before erasing them. It's not hard to get right this way, but you have to be pretty careful to not cause a memory leak by forgetting a delete. (Mark Ransom also points out: this is not exception safe, the entire vector contents is leaked if an exception causes the vector to be destroyed.)
Note that boost's ptr_vector cannot be used safely with some of the STL algorithms, which may be a problem for you.
You can increase the capacity of the underlying array used by the vector by calling its reserve member function:
v.reserve(100);
So long as you do not put more than 100 elements into the vector, ptr will point to the first element.
How would be a good way to guarantee it?
std::vector<T> is guaranteed to be continous, but the implementation is free to reallocate or free storage on operations altering the vector contents (vector iterators, pointers or references to elements become undefined as well).
You can achieve your desired result, however, by calling reserve. IIRC, the standard guarantees that no reallocations are done until the size of the vector is larger than its reserved capacity.
Generally, I'd be careful with it (you can quickly get trapped…). Better don't rely on std::vector<>::reserve and iterator persistence unless you really have to.
If you don't need your values stored contiguously, you can use std::deque instead of std::vector. It doesn't reallocate, but holds elements in several chunks of memory.
Another possibility possibility would be a purpose-built smart pointer that, instead of storing an address would store the address of the vector itself along with the the index of the item you care about. It would then put those together and get the address of the element only when you dereference it, something like this:
template <class T>
class vec_ptr {
std::vector<T> &v;
size_t index;
public:
vec_ptr(std::vector<T> &v, size_t index) : v(v), index(index) {}
T &operator*() { return v[index]; }
};
Then your int *ptr=&v[0]; would be replaced with something like: vec_ptr<int> ptr(v,0);
A couple of points: first of all, if you rearrange the items in your vector between the time you create the "pointer" and the time you dereference it, it will no longer refer to the original element, but to whatever element happens to be at the specified position. Second, this does no range checking, so (for example) attempting to use the 100th item in a vector that only contains 50 items will give undefined behavior.
As James McNellis and Alexander Gessler stated, reserve is a good way of pre-allocating memory. However, for completeness' sake, I'd like to add that for the pointers to remain valid, all insertion/removal operations must occur from the tail of the vector, otherwise item shifting will again invalidate your pointers.
Depending on your requirements and use case, you might want to take a look at Boost's Pointer Container Library.
In your case you could use boost::ptr_vector<yourClass>.
I came across this problem too and spent a whole day just to realize vector's address changed and the saved addresses became invalid. For my problem, my solution was that
save raw data in the vector and get relative indices
after the vector stopped growing, convert the indices to pointer addresses
I found the following works
pointers[i]=indices[i]+(size_t)&vector[0];
pointers[i]=&vector[ (size_t)indices[i] ];
However, I haven't figured out how to use vector.front() and I am not sure whether I should use
pointers[i]=indices[i]*sizeof(vector)+(size_t)&vector[0] . I think the reference way(2) should be very safe.