handling for references affected by adding elements to std::vector - c++

As pointed out in answer to another question, all pointers to a vector's elements may become invalid after new elements have been added to that vector, due to reallocation of the underlying contiguous buffer.
Is there a safe way to handle this issue at compile-time?
Are there best-practices to deal with or to avoid a situation, where references may become invalid after altering the data-structure?

Are there best-practices to deal with or to avoid a situation, where references may become invalid after altering the data-structure?
preallocate enough space for vector in advance
keep index of object in array instead of reference/pointer to object itself
use different container, for example std::list
which way will work for you the best way depends on your situation

A pointer or reference to the std::vector itself won't change. What could change is the address of a specific element inside the std::vector due to reallocation policy which is implementation dependent.
Preallocating enough space is a risk because you shouldn't relay on a particular implementation policy.
Storing the address of an element in a std::vector is a bad idea and can be easily avoided because std::vector [] operator is very fast so store the index instead the address of the element. This the right way.

Related

std::vector and pointer predictability

when you push_back() items into an std::vector, and retain the pointers to the objects in the vector via the back() reference -- are you guaranteed (assuming no deletions occur) that the address of the objects in the vector will remain the same?
It seems like my vector changes the pointers of the objects I use, such that if I push 10 items into it, and retain the pointers to those 10 items by remembering the back() reference after each push_back.
if your vector is to store objects, not pointers to objects, are the addresses of those objects subject to constant change upon pushing more items?
Any method that causes the vector to resize itself will invalidate all iterators, pointers, and references to the elements contained within. This can be avoided by reserving mememory, or using boost::stable_vector.
23.3.6.5/1:
Remarks: Causes reallocation if the new size is greater than the old capacity. If no reallocation happens,
all the iterators and references before the insertion point remain valid.
No, std::vector is not a stable container, i.e. pointers and iterators may get invalidated by resizing the vector (or, better, by the corresponding reallocation). If you want to avoid this behaviour, use boost::stable_vector or std::list or std::deque instead (I would prefer the last). Or, more easily, you can simply store your locations by indices.
For more information, consider also the answer to this question here.
It's not guaranteed. If you push_back enough items to exceed the size of the memory buffer that's the backing store of the vector, a new buffer will be created, all the contents will be copied to the new location, and the old buffer will be deleted. At that point, old pointers (as well as iterators!) will be invalid.
If you know exactly how much maximum space you will ever need, you could set the size of the vector's buffer to that size when you create it, to avoid reallocation. However, I prefer to store "references" to elements of a vector as a reference to the vector and a size_t index into the vector, instead of using pointers. It's not necessarily slower than pointers (depending on the CPU type) but, even if it is, it won't be much slower and in my opinion it's worth it for the peace of mind knowing that no matter how the vector is used in the future or reallocated, it'll still refer to the proper element.

Why is vector::iterator invalidated upon reallocation?

I don't understand why a vector's iterator should be invalidated when a reallocation happens.
Couldn't this have been prevented simply by storing an offset -- instead of a pointer -- in the iterator?
Why was vector not designed this way?
Just to add a citation to the performance-related justification: when designing C++, Stroustrup thought it was vital that template classes like std::vector approach the performance characteristics of native arrays:
One reason for the emphasis on run-time efficiency...was that I wanted
templates to be efficient enough in time and space to be used for
low-level types such as arrays and lists.
...
Higher-level alternatives -- say, a range-checked array with a size()
operation, a multidimensional array, a vector type with proper numeric
vector operations and copy semantics, etc. -- would be accepted by
users only if their run-time, space, and notational convenience
approached those of built-in arrays.
In other words, the language mechanism supplying parameterized types
should be such that a concerned user should be able to afford to
eliminate the use of arrays in favor of a standard library class.
Bjarne Stroustrup, Design and Evolution of C++, p.342.
Because for iterators to do that, they'd need to store a pointer to the vector object. For each data access, they'd need to follow the pointer to the vector, then follow the pointer therein to the current location of the data array, then add the offset * the element size. That'd be much slower, and need more memory for the size_type member.
Certainly, it's a good compromise sometimes and it would be nice to be able to choose it when wanted, but it's slower and bulkier than (C-style) direct array usage. std::vector was ruthlessly scrutinised for performance when the STL was being introduced, and the normal implementation is optimised for space and speed over this convenience/reliability factor, just as the array-equivalent operator[] is as fast as arrays but less safe than at().
You can add safety by wrapping the standard std::vector<T>::iterator, but you can't add speed by wrapping a extension::vector<T>::safe_iterator. That's a general principle, and explains many C++ design choices.
There are many reasons for these decisions. As others pointed out, the most basic implementation of iterator for a vector is a plain pointer to the element. To be able to handle push_back iterators would have to be modified to handle a pointer into the vector and a position, on access through the operator, the vector pointer would ave to be dereferenced, the pointer to the data obtained and the position added, with an extra dereference.
While that would not be the most efficient implementation, that is not really a limiting factor. The default implementation of iterators in VS/Dinkumware libraries (even in release) are checked iterators, that manage an equivalent amount of information.
The actual problem comes with other mutating operations. Consider inserting/erasing in the middle of the vector. To maintain validity of all iterators, the container would have to track all the instances of iterators and adapt the position field so that they still refer to the same element (that has been displaced by the insertion/removal).
You would need to store both the offset and a pointer to the vector object itself.
As specified, the iterator can just be a pointer, which takes less space.
TL;DR -- because you're trading simple rules for invalidation for far more complicated action-at-a-distance ones.
Please note that "store a pointer to the vector object" would cause new invalidation cases. For example, today swap preserves iterator validity, if a pointer (or reference) to the vector is stored inside iterators, it no longer could. All operations that move the vector metadata itself (vector-of-vectors anyone?) would invalidate iterators.
You trade is "iterator becomes invalid when a pointer/reference to the element is invalidated" for "iterator becomes invalid when a pointer/reference to the vector is invalidated".
The performance arguments don't much matter, because the proposed alternate implementation is not even correct.
I an iterator wasn't invalidated, should it point to the same element or to the same position after an insertion before it? In other words, even if there were no performance issues, it is non-trivial to decide which alternative definition to use.

why not implement c++ std::vector::pop_front() by shifting the pointer to vector[0]?

Why can't pop_front() be implemented for C++ vectors by simply shifting the pointer contained in the vector's name one spot over? So in a vector containing an array foo, foo is a pointer to foo[0], so pop_front() would make the pointer foo = foo[1] and the bracket operator would just do the normal pointer math. Is this something to do with how C++ keeps track of the memory you're using for what when it allocates space for an array?
This is similar to other questions I've seen about why std::vector doesn't have a pop_front() function, I will admit, but i haven't anyone asking about why you can't shift the pointer.
The vector wouldn't be able to free its memory if it did this.
Generally, you want the overhead per vector object to be small. That means you only store three items: the pointer to the first element, the capacity, and the length.
In order to implement what you suggest, every vector ever (all of them) would need an additional member variable: the offset from the start pointer at which the zeroth element resides. Otherwise, the memory could not be freed, since the original handle to it would have been lost.
It's a tradeoff, but generally the memory consumption of an object which may have millions of instances is more valuable than the corner case of doing the absolute worst thing you can do performance-wise to the vector.
Because implementers want to optimize the size of a vector. They usually use 3 pointers, one for the beginning, one for the capacity (the allocated end) and one for the end.
Doing what you require adds another 4 bytes to every vector (and there are a lot of those in a c++ program) for very little benefit: the contract of vector is to be fast when pushing back new elements, removing and inserting are "unsual" operations and their performance matter less than the size of the class.
I started typing out an elaborate answer explaining how the memory is allocated and freed but after typing it all out I realized that memory issues alone don't justify why pop_front isn't there as other answers here suggested.
Having pop_front in a vector where the extra cost is another pointer is justifiable in most circumstances. The problem, in my opinion, is push_front. If the container has pop_front then it should also have push_front otherwise the container is not being consistent. push_front is definitely expensive for a vector container (unless you match your pushes with your pops which is not a good design). Without push_front the vector is really wasting memory if one does lots of pop_front operations with no push_front functionality.
Now the need for pop_front and push_front is there for a container that is similar to a vector (constant time random access) which is why there is deque.
You could, but it would complicate the implementation a bit, and add a pointer of overhead to the type's size (so it could track the actual allocation's address). Is that worth it? Sometimes. First consider other structures which may handle your usage better (maybe deque?).
You could do that, but vector is designed to be a simple container with constant time index lookups and push/pop from the end. Doing what you suggest would complicate the implementation as it would have to track the allocated beginning and the "current" beginning. Not to mention that you still couldn't guarantee constant time insertion at the front but you might get it sometimes.
If you need a container with constant time front and back insertion and removal, that's precisely what deque is for, there's no need to modify vector to handle it.
You can use std::deque instead of std::vector. It's a double-ended-queue with also the vector-like access members. It implements both front and back push/pop.
http://www.cplusplus.com/reference/stl/deque/
Another shortcoming of your suggestion is that you'll waste memory spaces as you can't make use of those on the left of the array after shifting. The more you execute pop_front(), the more you'll waste until the vector is destructed.

Something like a deque on large numbers of items, but small memory usage on small numbers?

I have a whole bunch of objects of a certain type, each of which may allocate a deque to hold other objects of that same type. I am using a deque because I need fast access at both ends, and because any particular object could possibly refer to many other objects.
However, it's likely the case that many or even most of the objects refer to very few other objects. In this case, the memory usage of deque is pretty big. The implementation I'm using is allocating 4096 bytes at a shot, as soon as I do the very first push_back(). Each element in the deque is only 8 bytes. That's a whole lot of wasted space, especially because I'm making many of these objects, and hence many of these deques.
At the same time, I pretty much need a deque (or something like it), because like I said, any particular object can actually refer to many other objects, despite the fact that most objects refer to very few other objects.
My first thought was using capacity() and reserve() to grow the deque myself, but my compiler informed me that there are no such functions on deque.
So, I was thinking perhaps to write a class with a deque-like interface, underlying which is a vector and a deque, with the vector used until (say) sixteen elements exist, after which the vector is thrown away and the deque is used from there on out.
Since the vector is only used when there are only a small number of elements, it shouldn't really matter too much that push_front() and pop_front() are going to be inefficient in terms of speed, and since I can control the vector with capacity() and reserve(), it shouldn't really matter too much that deque uses a lot of memory when more elements exist.
But, before rolling my own class like this, I wanted to check to see if something like this already exists. Also, if anybody knows of any reason I haven't thought of why something like this is a bad idea, or if anybody has any related suggestions, I'd love to hear it.
Thanks in advance.
You don't mention if you need other capabilities of vector or deque like random access iterators. If you don't this actually sounds like a good candidate to use list. It has good performance inserting and removing from both ends.
You could use an (intrusive) list if you don't need random access by index. Lists allow quick O(1) push_front/push_back() and pop_front/pop_back().
If objects are not shared, that is, an object is only ever owned by at most one other object, than an intrusive list would be the best. And since your objects are of the same type, they can be allocated from one memory pool (big array) to avoid any memory overhead.

Does std::vector change its address? How to avoid

Since vector elements are stored contiguously, I guess it may not have the same address after some push_back's , because the initial allocated space could not suffice.
I'm working on a code where I need a reference to an element in a vector, like:
int main(){
vector<int> v;
v.push_back(1);
int *ptr = &v[0];
for(int i=2; i<100; i++)
v.push_back(i);
cout << *ptr << endl; //?
return 0;
}
But it's not necessarily true that ptr contains a reference to v[0], right? How would be a good way to guarantee it?
My first idea would be to use a vector of pointers and dynamic allocation. I'm wondering if there's an easier way to do that?
PS.: Actually I'm using a vector of a class instead of int, but I think the issues are the same.
Don't use reserve to postpone this dangling pointer bug - as someone who got this same problem, shrugged, reserved 1000, then a few months later spent ages trying to figure out some weird memory bug (the vector capacity exceeded 1000), I can tell you this is not a robust solution.
You want to avoid taking the address of elements in a vector if at all possible precisely because of the unpredictable nature of reallocations. If you have to, use iterators instead of raw addresses, since checked STL implementations will tell you when they have become invalid, instead of randomly crashing.
The best solution is to change your container:
You could use std::list - it does not invalidate existing iterators when adding elements, and only the iterator to an erased element is invalidated when erasing
If you're using C++0x, std::vector<std::unique_ptr<T>> is an interesting solution
Alternatively, using pointers and new/delete isn't too bad - just don't forget to delete pointers before erasing them. It's not hard to get right this way, but you have to be pretty careful to not cause a memory leak by forgetting a delete. (Mark Ransom also points out: this is not exception safe, the entire vector contents is leaked if an exception causes the vector to be destroyed.)
Note that boost's ptr_vector cannot be used safely with some of the STL algorithms, which may be a problem for you.
You can increase the capacity of the underlying array used by the vector by calling its reserve member function:
v.reserve(100);
So long as you do not put more than 100 elements into the vector, ptr will point to the first element.
How would be a good way to guarantee it?
std::vector<T> is guaranteed to be continous, but the implementation is free to reallocate or free storage on operations altering the vector contents (vector iterators, pointers or references to elements become undefined as well).
You can achieve your desired result, however, by calling reserve. IIRC, the standard guarantees that no reallocations are done until the size of the vector is larger than its reserved capacity.
Generally, I'd be careful with it (you can quickly get trapped…). Better don't rely on std::vector<>::reserve and iterator persistence unless you really have to.
If you don't need your values stored contiguously, you can use std::deque instead of std::vector. It doesn't reallocate, but holds elements in several chunks of memory.
Another possibility possibility would be a purpose-built smart pointer that, instead of storing an address would store the address of the vector itself along with the the index of the item you care about. It would then put those together and get the address of the element only when you dereference it, something like this:
template <class T>
class vec_ptr {
std::vector<T> &v;
size_t index;
public:
vec_ptr(std::vector<T> &v, size_t index) : v(v), index(index) {}
T &operator*() { return v[index]; }
};
Then your int *ptr=&v[0]; would be replaced with something like: vec_ptr<int> ptr(v,0);
A couple of points: first of all, if you rearrange the items in your vector between the time you create the "pointer" and the time you dereference it, it will no longer refer to the original element, but to whatever element happens to be at the specified position. Second, this does no range checking, so (for example) attempting to use the 100th item in a vector that only contains 50 items will give undefined behavior.
As James McNellis and Alexander Gessler stated, reserve is a good way of pre-allocating memory. However, for completeness' sake, I'd like to add that for the pointers to remain valid, all insertion/removal operations must occur from the tail of the vector, otherwise item shifting will again invalidate your pointers.
Depending on your requirements and use case, you might want to take a look at Boost's Pointer Container Library.
In your case you could use boost::ptr_vector<yourClass>.
I came across this problem too and spent a whole day just to realize vector's address changed and the saved addresses became invalid. For my problem, my solution was that
save raw data in the vector and get relative indices
after the vector stopped growing, convert the indices to pointer addresses
I found the following works
pointers[i]=indices[i]+(size_t)&vector[0];
pointers[i]=&vector[ (size_t)indices[i] ];
However, I haven't figured out how to use vector.front() and I am not sure whether I should use
pointers[i]=indices[i]*sizeof(vector)+(size_t)&vector[0] . I think the reference way(2) should be very safe.