How to access element(having a unique identifier) in a vector using a map in constant time? [duplicate] - c++

When vector needs more memory it will reallocate memory somewhere, I don't know where yet! and then pointers become invalid, is there any good explanation on this?
I mean where they go, what happen to my containers? ( not linked list ones )

Short answer: Everything will be fine. Don't worry about this and get back to work.
Medium answer: Adding elements to or removing them from a vector invalidates all iterators and references/pointers (possibly with the exception of removing from the back). Simple as that. Don't refer to any old iterators and obtain new ones after such an operation. Example:
std::vector<int> v = get_vector();
int & a = v[6];
int * b = &v[7];
std::vector<int>::iterator c = v.begin();
std::advance(it, 8);
v.resize(100);
Now a, b and c are all invalid: You cannot use a, and you cannot dereference b or c.
Long answer: The vector keeps track of dynamic memory. When the memory is exhausted, it allocates a new, larger chunk elsewhere and copies (or moves) all the old elements over (and then frees up the old memory, destroying the old objects). Memory allocation and deallocation is done by the allocator (typically std::allocator<T>), which in turn usually invokes ::operator new() to fetch memory, which in turn usually calls malloc(). Details may vary and depend on your platform. In any event, any previously held references, pointers or iterators are no longer valid (presumably because they refer to the now-freed memory, though it's not specified in the standard why they're invalid).

When you add or remove items from a vector, all iterators (and pointers) to items within it are invalidated. If you need to store a pointer to an item in a vector, then make the vector const, or use a different container.
It shouldn't matter to you where the vector stores things. You don't need to do anything, just let it do its job.

When you use std::vector, the class takes care of all the details regarding memory allocation, pointers, resizing and so on.
The vector class exposes its contents through iterators and references. Mutations of the vector will potentially invalidate iterators and references because reallocation may be necessary.
It is valid to access the contents using pointers because the vector class guarantees to store its elements at contiguous memory locations. Clearly any mutation of the list will potentially invalidate any pointers to its contents, because of potential reallocation. Therefore, if you ever access an element using pointers, you must regard those pointers as invalid once you mutate the vector. In short the same rules apply to pointers to the contents as do to references.
If you want to maintain a reference to an item in the vector, and have this reference be valid even after mutation, then you should remember the index rather than a pointer or reference to the item. In that case it is perfectly safe to add to the end of the vector and your index value still refers to the same element.

Related

How does deque manage memory?

There are some confusion for me in using deque container.
I compared a vector with a deque, I entered Integer values dynamically and observed that after a few insertions vector's objects start moving around and addresses have been changed which seemed logical. However deque's objects stayed on the same place in the memory even after I entered a few hundred integers.
This observation gives me the idea that deque reserves a much larger memory than vector but if it is true then what is the point of having dynamic memory instead of static? Even if it does, it will run out of memory at somewhere and need to change the place on the memory, So the next question is does it move every object or just start using memory somewhere else and links it with previous location?
deque container supports iterator arithmetic but is it safe to use it? i want to know how deque manage the memory not how one might prefer to use it.
A deque is a double linked list of mini vectors. That means addresses are stable.
Iterator arithmetic is valid unless operation which invalidates iterators is performed.
This is true for vectors too
From this std::deque reference:
... typical implementations use a sequence of individually allocated fixed-size arrays
You could think of it like a list of arrays.
A deque is typically implemented as a sequence of fixed-length pages of elements. If you append elements, when a page is full a new one is allocated and added at the end of the pages index. This guarantees that, if you add or remove elements only at the end or at the beginning, the ones that have already been stored aren't moved around in memory (in standardese speak, references to existing elements aren't invalidated by push_back, pop_back, push_front and pop_front).

Memory Allocation in STL C++

I am a little confused about memory reallocation in STL C++. For example, I know if I declare a vector, and keep pushing back elements into it, the vector will at some point need a reallocation of memory space and copy all the existing elements into it. For linked lists no reallocation is needed, because the elements are not stored consecutively in the stack and each element uses a pointer to point to the next element.
My question is, what's the situation for other STL in C++ ? for example, string, map,unordered_map? Do they need reallocation ?
(disclaimer: all the concrete data structures specified here may not be required by the standard, but they are useful to remember to help link the rules to something concrete)
std::string ~= std::vector; it's a dynamic array, if you keep pushing elements at its end, at some time it will reallocate your elements.
std::list: each element is a new linked list node, so no reallocation ever takes place, wherever you may insert the new element.
std::deque: it's typically made of several pages of elements; when a page is full, a new one is allocated and added to the list of pages; for this reason, no reallocation of your elements ever takes place, and your elements never get moved if you keep pushing stuff at the beginning or at the end.
std::map/std::multimap/std::set/std::multiset: typically a binary tree, each element is allocated on its own; no reallocations are ever performed.
std::unordered_map/std::unordered_multimap/std::unordered_set/std::unordered_multiset: a hash table; when the table is full enough, rehashing and reallocation occurs.
Almost all STL containers memory is allocated on heap, even for a vector. A plain array and std::array template are the only once probably that have memory on stack.
If your questions is about continuous memory allocation (whether on stack or heap), then, probably, plain array, std::array, std::vector have got continuous memory. Almost all other containers such as list, deque, map, set etc. do not have memory allocated continuously.

std::vector and pointer predictability

when you push_back() items into an std::vector, and retain the pointers to the objects in the vector via the back() reference -- are you guaranteed (assuming no deletions occur) that the address of the objects in the vector will remain the same?
It seems like my vector changes the pointers of the objects I use, such that if I push 10 items into it, and retain the pointers to those 10 items by remembering the back() reference after each push_back.
if your vector is to store objects, not pointers to objects, are the addresses of those objects subject to constant change upon pushing more items?
Any method that causes the vector to resize itself will invalidate all iterators, pointers, and references to the elements contained within. This can be avoided by reserving mememory, or using boost::stable_vector.
23.3.6.5/1:
Remarks: Causes reallocation if the new size is greater than the old capacity. If no reallocation happens,
all the iterators and references before the insertion point remain valid.
No, std::vector is not a stable container, i.e. pointers and iterators may get invalidated by resizing the vector (or, better, by the corresponding reallocation). If you want to avoid this behaviour, use boost::stable_vector or std::list or std::deque instead (I would prefer the last). Or, more easily, you can simply store your locations by indices.
For more information, consider also the answer to this question here.
It's not guaranteed. If you push_back enough items to exceed the size of the memory buffer that's the backing store of the vector, a new buffer will be created, all the contents will be copied to the new location, and the old buffer will be deleted. At that point, old pointers (as well as iterators!) will be invalid.
If you know exactly how much maximum space you will ever need, you could set the size of the vector's buffer to that size when you create it, to avoid reallocation. However, I prefer to store "references" to elements of a vector as a reference to the vector and a size_t index into the vector, instead of using pointers. It's not necessarily slower than pointers (depending on the CPU type) but, even if it is, it won't be much slower and in my opinion it's worth it for the peace of mind knowing that no matter how the vector is used in the future or reallocated, it'll still refer to the proper element.

vector pointer locations guaranteed?

Suppose I have a vector of ints,
std::vector<int> numbers;
that is populated with a bunch of values, then I say do this (where an entry exists at 43)
int *oneNumber = &numbers[43];
Is oneNumber guaranteed to always be pointing at the int at index 43, even if say I resize numbers to something like numbers.resize(46)?
I'm not 100% sure what expected behaviour is here, I know vectors are guaranteed to be contiguous but not sure if that continuity would also mean all the indices in the vector will remain in the same place throughout its life.
Is oneNumber guaranteed to always be pointing at the int at index 43
Yes, this is guaranteed by the standard.
even if say I resize numbers to something like numbers.resize(46)?
No. Once you resize, add, or remove anything to the vector, all addresses and iterators to it are invalidated. This is because the vector may need to be reallocated with new memory locations.
Your paranoia is correct. Resizing a std::vector can cause its memory location to change. That means your oneNumber is now pointing to an old memory location that has been freed, and so accessing it is undefined behavior.
Pointers, references, and iterators to std::vector elements are guaranteed to stay put as long as you only append to the std::vector and the size of the std::vector doesn't grow beyond its capacity() at the time the pointer, reference, or iterator was obtained. Once it gets resized beyond the capacity() all pointers, references, and iterators to this std::vector become invalidated. Note that things are invalidated as well when inserting somewhere else than the end of the std::vector.
If you want to have your objects stay put and you only insert new elements at the end or the beginning, you can use std::deque. Pointers and references to elements in the std::deque get only invalidated when you insert into the middle of the std::deque or when removing from the middle or when removing the referenced object. Note that iterators to elements in the std::deque get invalidated every time you insert an element into the std::deque or remove any element from it.
As all the others have said, when you call .resize() on a vector your pointers become invalidated because the (old array) may be completely deallocated, and an entirely new one may be re-allocated and your data copied into it.
One workaround for this is don't store pointers into an STL vector. Instead, store integer indices.
So in your example,
std::vector<int> numbers;
int *oneNumber = &numbers[43]; // no. pointers invalidated after .resize or possibly .push_back.
int oneNumberIndex = 43 ; // yes. indices remain valid through .resize/.push_back
No - the vector can be reallocated when it grows. Usually once the vector doubles in size.
From the C++11 standard
1 Remarks: Causes reallocation if the new size is greater than the old capacity. If no
reallocation happens, all the iterators and references before the insertion point
remain valid. If an exception is thrown other than by the copy constructor, move
constructor, assignment operator, or move assignment operator of T or by any
InputIterator operation there are no effects. If an exception is thrown by the move
constructor of a non-CopyInsertable T, the effects are unspecified.
When you use a vector's resize() or reserve() function to increase the capacity of the vector, it may need to reallocate memory for the array-backing. If it does reallocate, the new memory will not be located at the same address, so the address stored in oneNumber will no longer point to the right place.
Again, this depends on how many elements the vector is currently being used to store and the requested size. Depending on the specifics, the vector may be able to resize without reallocating, but you should definitely not assume that this will be the case.
Once you changed the capacity of the vector, the data was copied to another memory block, and the origin data is deleted.

Does std::vector change its address? How to avoid

Since vector elements are stored contiguously, I guess it may not have the same address after some push_back's , because the initial allocated space could not suffice.
I'm working on a code where I need a reference to an element in a vector, like:
int main(){
vector<int> v;
v.push_back(1);
int *ptr = &v[0];
for(int i=2; i<100; i++)
v.push_back(i);
cout << *ptr << endl; //?
return 0;
}
But it's not necessarily true that ptr contains a reference to v[0], right? How would be a good way to guarantee it?
My first idea would be to use a vector of pointers and dynamic allocation. I'm wondering if there's an easier way to do that?
PS.: Actually I'm using a vector of a class instead of int, but I think the issues are the same.
Don't use reserve to postpone this dangling pointer bug - as someone who got this same problem, shrugged, reserved 1000, then a few months later spent ages trying to figure out some weird memory bug (the vector capacity exceeded 1000), I can tell you this is not a robust solution.
You want to avoid taking the address of elements in a vector if at all possible precisely because of the unpredictable nature of reallocations. If you have to, use iterators instead of raw addresses, since checked STL implementations will tell you when they have become invalid, instead of randomly crashing.
The best solution is to change your container:
You could use std::list - it does not invalidate existing iterators when adding elements, and only the iterator to an erased element is invalidated when erasing
If you're using C++0x, std::vector<std::unique_ptr<T>> is an interesting solution
Alternatively, using pointers and new/delete isn't too bad - just don't forget to delete pointers before erasing them. It's not hard to get right this way, but you have to be pretty careful to not cause a memory leak by forgetting a delete. (Mark Ransom also points out: this is not exception safe, the entire vector contents is leaked if an exception causes the vector to be destroyed.)
Note that boost's ptr_vector cannot be used safely with some of the STL algorithms, which may be a problem for you.
You can increase the capacity of the underlying array used by the vector by calling its reserve member function:
v.reserve(100);
So long as you do not put more than 100 elements into the vector, ptr will point to the first element.
How would be a good way to guarantee it?
std::vector<T> is guaranteed to be continous, but the implementation is free to reallocate or free storage on operations altering the vector contents (vector iterators, pointers or references to elements become undefined as well).
You can achieve your desired result, however, by calling reserve. IIRC, the standard guarantees that no reallocations are done until the size of the vector is larger than its reserved capacity.
Generally, I'd be careful with it (you can quickly get trapped…). Better don't rely on std::vector<>::reserve and iterator persistence unless you really have to.
If you don't need your values stored contiguously, you can use std::deque instead of std::vector. It doesn't reallocate, but holds elements in several chunks of memory.
Another possibility possibility would be a purpose-built smart pointer that, instead of storing an address would store the address of the vector itself along with the the index of the item you care about. It would then put those together and get the address of the element only when you dereference it, something like this:
template <class T>
class vec_ptr {
std::vector<T> &v;
size_t index;
public:
vec_ptr(std::vector<T> &v, size_t index) : v(v), index(index) {}
T &operator*() { return v[index]; }
};
Then your int *ptr=&v[0]; would be replaced with something like: vec_ptr<int> ptr(v,0);
A couple of points: first of all, if you rearrange the items in your vector between the time you create the "pointer" and the time you dereference it, it will no longer refer to the original element, but to whatever element happens to be at the specified position. Second, this does no range checking, so (for example) attempting to use the 100th item in a vector that only contains 50 items will give undefined behavior.
As James McNellis and Alexander Gessler stated, reserve is a good way of pre-allocating memory. However, for completeness' sake, I'd like to add that for the pointers to remain valid, all insertion/removal operations must occur from the tail of the vector, otherwise item shifting will again invalidate your pointers.
Depending on your requirements and use case, you might want to take a look at Boost's Pointer Container Library.
In your case you could use boost::ptr_vector<yourClass>.
I came across this problem too and spent a whole day just to realize vector's address changed and the saved addresses became invalid. For my problem, my solution was that
save raw data in the vector and get relative indices
after the vector stopped growing, convert the indices to pointer addresses
I found the following works
pointers[i]=indices[i]+(size_t)&vector[0];
pointers[i]=&vector[ (size_t)indices[i] ];
However, I haven't figured out how to use vector.front() and I am not sure whether I should use
pointers[i]=indices[i]*sizeof(vector)+(size_t)&vector[0] . I think the reference way(2) should be very safe.