Vectors, when to re-reference elements - c++

So far I have only worked with lists in C++ (Queues, stacks, tree etc.. in Java). I have done some reading and have endeavoured to learn about Vectors as they are good for traversal compared to lists and don't have the complexity of Arrays in regards to house keeping.
So far I am aware that there can be an issue in regards to pointer validation in the event the Vector needs to be reallocated. The pickle being (as far as I know) no real way to determine if the adding of an element to the Vector will trigger reallocation.
One answer I can think of is to re-assign the pointers to each element every time an element is added or removed.
This seems like a decent amount of overhead on the chance reallocation is done. Is there a better way perhaps?

One way to approach this is just to have a vector of pointers (preferably smart pointers, so you don't need to worry about manual deallocation).
E.g. instead of
std::vector<MyObject> vec;
vec.push_back(MyObject());
MyObject* ptr = &vec[0];
Do something like this:
std::vector<std::unique_ptr<MyObject>>
vec.push_back(std::unique_ptr<MyObject>(new MyObject())); // *
MyObject* ptr = vec[0].get();
(* or use vec.push_back(std::make_unique<MyObject>()) in C++14)
If done in the second way, ptr will always be valid across internal reallocations of the vector, because it's not pointing to memory that is managed by the vector, it is pointing to memory that is managed by the unique_ptr, which will not change until the object is explicitly released or destroyed.

Related

What's wrong with this method of deletion in a vector?

I have a deletion method that works and is as follows:
void deleteUserByID(int id, std::vector<Person*>& userList)
{
for(int i = 0; i < userList.size(); i++) {
if (userList.at(i)->getID() == id) {
userList.erase(userList.begin() + i);
}
}
}
However, I was trying the following before the above and couldn't understand why it wasn't working.
Instead of using userList.erase(userList.begin() + i);, I was using delete userList.at(i)
I'm somewhat new to C++, and have been instructed to delete heap allocated memory with the "delete" keyword. I felt that should have removed it from the Vector, but was wrong.
Why doesn't the delete userList.at(i) work? I'm curious. Any info would be helpful.
There are two separate concepts at play here. First, there's the maintenance of the std::vector that you're using. The vector's job is to hold a sequence of elements, and in many ways it doesn't really care what those elements actually are. From the vector's perspective, its elements will stick around until something explicitly comes along and says to get rid of them. The call to erase tells the vector "Hey, you know that element you've got at that one position? Please get rid of it." So when you make the call to erase, you're telling the vector to get rid of one of its elements.
Independently, there's the objects that are being stored in the vector. You're storing Person *s, which are pointers to Person objects. Those objects (I'm assuming) were allocated with new, so each Person essentially thinks "I'm going to live forever, or at least until someone comes around and calls delete on me." If you delete one of the Person objects, that object ceases to exist. However, the Person objects have absolutely no idea that there's a vector somewhere with pointers to people.
In order to get everything to work the way you want it to, you actually need to use a combination of both erase and delete (with a caveat that I'll mention later). If you just erase the pointers from the vector, then from the vector's perspective everything is cleaned up (it no longer holds pointers to the Person object in question), but from the Person's perspective the Person object is still very much alive and well because you never said to delete it. If you just delete the pointers, then from the Person's perspective everything is cleaned up (you've told the Person that it's time to go to the giant playground in the sky), but from the vector's perspective nothing was added or removed, so you now have a dangling pointer in your vector. In other words, the first option results in a memory leak - there's a Person object that was never told to clean itsefl up - and the second option results in dangling pointer - there's a pointer to what used to be a person, but which is now a bunch of bits that can be recycled however the program wishes.
Using the setup you have right now, the "best" way to handle this would be to use a combined approach. When you find an item to remove, first delete the pointer, then call erase. That ensures that the Person gets cleaned up and that the vector no longer has a dangling pointer in it.
But as some of the commenters have noted, there's a much better way to do this. Rather than storing Person *s and using raw pointers to reference the Person objects, use the std::shared_ptr type and manage your Person objects through std::shared_ptr<Person>. Unlike a regular pointer, which just says "yeah, there's a thing over there" and won't do any memory management on its own, the std::shared_ptr type actually owns the resource that it points at. If you erase a std::shared_ptr from a vector, the std::shared_ptr then says "okay, I just got kicked out of the vector, and if I'm the last pointer to the Person, I'll go and delete it for you." That means that you don't need to do any of your own memory management to clean things up.
In summary:
Just calling erase gets rid of an element from the vector, but leaves a Person adrift in the heap, wondering why no one loves it anymore.
Just calling delete sets the Person object free, but leaves a ghostly pointer to it in the vector that's a major hazard.
Calling both delete and erase in the proper order will solve this problem, but isn't the ideal solution.
Using std::shared_ptr instead of raw pointers is probably the best option, since it ensures that all the right deletes happen automatically.
Hope this helps!
And a quick addendum - are you sure that you code correctly visits all the elements of the vector? For example, if you erase the item at index 0, all the other elements of the vector will shift back one position. But then your implementation increments i to 1, at which point you've skipped over the item that just got shifted back to the first position.
I'll let you think about how to resolve this. Another answer has offered a good suggestion of using remove_if, which is one good solution, though if for your own edification you want to roll your own version, you might want to think over how you'd address the above issue.
This is one of those places a picture is almost certainly worth at least a thousand words. The vector is storing pointers, which point to (presumably) dynamically allocated objects, something like this:
So, the green boxes represent the elements in the vector itself. The blue boxes represent your data objects. I've separated the third one to signify the fact that it's the one we're going to (eventually) remove.
As it stands right now, your code is deleting some of the green boxes. It leaves the blue box (your data) in memory, but you no longer have a pointer to it:
At this point, you're right that the data no longer appears in the vector, so your routine has "worked" to that extent. The problem is that you no longer have access to that data, so you've leaked its memory.
What's (apparently) being suggested is that when you find the object you want to remove from the list, you should first use delete to destroy the data (the blue box):
...then use erase to remove that element from the vector:
Alternatives
I would not use a std::shared_ptr for a case like this. A shared_ptr is intended to manage objects that have shared ownership, and nothing you've said indicates that you're dealing with shared ownership. If you must use dynamically allocated objects, and don't want to manage things manually (which I agree is a good thing to avoid), you might consider using std::unique_ptr, or you might want to consider using a Boost ptr_vector instead.
Alternatively, consider changing it to a std::vector<Person> (i.e., store the objects directly in the vector instead of storing pointers to dynamically allocated objects). At least in my experience, this is really the right answer the vast majority of the time. If you really need to ensure against moving the Person objects around when the vector resizes, consider using an std::deque<Person> instead. A std::deque<Person> is fairly close to what you've created, but with at least some potential for the compiler to optimize allocation by putting a number of data objects (Persons, in your case) into a single block of memory, instead of allocating each one individually.
Conclusion
Until or unless evidence to the contrary is found, the right answer is most likely std::vector<Person> with std::deque<Person> in second place. Direct dynamic allocation of the Person objects, with something to automate their deletion runs a distant third place (at best).
The other answers given summarize what you really should do in terms of design, and that is to use smart pointers.
However, if you really did use raw pointers, and allocated those entries with new, the way you can delete and erase without writing any loops is to
Partition the elements to delete
delete the elements
Erase the partitioned elements from the vector using vector<T>::erase.
Here is an example:
void deleteUserByID(int id, std::vector<Person*>& userList)
{
// partition the about-to-be deleted elements to the right of the partition
// and all good items to the left of the partition
auto iter = std::partition(userList.begin(), userList.end(), [&](Person *p)
{ return p->getID() != id; });
// issue a delete on those elements on right of partition
std::for_each(iter, userList.end(), [](Person *p) { delete p; });
// now erase those elements from the vector.
userList.erase(iter, userList.end());
}
The std::partition simply places all elements you wish to delete on the right of the partition (which is returned by iter). Then it's just a matter of calling delete on those elements on the right of the partition, and finally erase those elements.
The reason why this 3-step process was done instead of directly using the std::remove_if is that std::remove_if gives you undetermined elements in the range denoting the items that were "removed", thus issuing subsequent delete calls on those elements would have resulted in undefined behavior.
For example, this code, even though it looks like it would work, actually results in undefined behavior:
void deleteUserByID(int id, std::vector<Person*>& userList)
{
// move items to be removed to the end of the vector
auto iter = std::remove_if(userList.begin(), userList.end(), [&](Person *p)
{ return p->getID() == id; });
// issue a delete on those elements (this actually invokes undefined behavior)
std::for_each(iter, userList.end(), [](Person *p) { delete p; });
// now erase those elements from the vector (if your program even gets this far)
userList.erase(iter, userList.end());
}
Basically, you can't do anything "special" to the items in the removed range (for example, call delete), as those items are indeterminate garbage. The only thing you can safely do is to erase them.
So the trick is to partition the elements (which doesn't invalidate those items), delete the partitioned elements, and then remove them using erase.
*Note that if you want to keep the order of the elements that will not be deleted, then use std::stable_partition instead of std::partition.
Proper way to do it is to use smart pointers and an algorithm from STL.
void deleteUserByID(int id, std::vector<std::unique_ptr<Person>>& userList)
{
auto endIt = std::remove_if(userList.begin(), userList.end(),
[id](const auto &person) {
return person->getID() == id;
});
userList.erase(endIt, userList.end());
}
These are two different and complementary things. For your vector
userList.erase(userList.begin() + i);
will remove the ith pointer from your vector, but will not affect the pointed at Person object in any way
delete userList.at(i);
will delete (free) the Person object pointed at by the ith pointer in your vector, but will not affect the vector in any way.
Depending on where these Person objects are coming from and what you are trying to do, you might need to do both.

No advantage to contiguous memory vector when elements are pointers or references?

I've been reading about the different containers in the C++ standard library, and I keep on hearing about how the simple vector in practice will often outperform most of the other containers when iterating over the elements. This is said to be due to the cache coherency (all stored in contiguous memory), instead of jumping around from place to place in a binary tree or linked list. But I was thinking, if we're talking about a vector of pointers or references to objects as opposed to the objects themselves, iterating over the vector will involve a dereference on each iteration, where the object is located in a separate area of memory. In this case I can't see it being any better than jumping from link to link in a list or a tree. The way I see it is like the following, and each one pretty much does the same thing as far as I can see.
If this is true, then can I assume that whenever people claim that the vector is more cache-friendly that it's ONLY the case when storing objects, and not pointers or references to objects? Also, I don't suppose if the pointers would be to a polymorphic type would make a difference between the two?
|ptr1|ptr2|ptr3|ptr4| //vector
and
|ptr1|--->|ptr2|--->|ptr3|--->|ptr4| //list
Now consider accessing the third object via ptr3.
Time taken by vector.
O(1) time to reach ptr3 + time to dereference ptr3
Time taken by list
O(2) time to reach ptr3 + time to dereference ptr3.
So the difference is in reaching the pointer to be dereferenced.
In general accessing an element on a vector is O(1) while in a list is O(n)

Is there an equivalent of vector::reserve() for an std::list?

I have a class that looks like this:
typedef std::list<char*> PtrList;
class Foo
{
public:
void DoStuff();
private:
PtrList m_list;
PtrList::iterator m_it;
};
The function DoStuff() basically adds elements to m_list or erases elements from it, finds an iterator to some special element in it and stores it in m_it. It is important to note that each value of m_it is used in every following call of DoStuff().
So what's the problem?
Everything works, except that profiling shows that the operator new is invoked too much due to list::push_back() called from DoStuff().
To increase performance I want to preallocate memory for m_list in the initialization of Foo as I would do if it were an std::vector. The problem is that this would introduce new problems such as:
Less efficient insert and erase of elements.
m_it becomes invalid as soon as the vector is changed from one call to DoStuff() to the next. EDIT: Alan Stokes suggested to use an index instead of an iterator, solving this issue.
My solution: the simplest solution I could think of is to implement a pool of objects that also has a linked-list functionality. This way I get a linked list and can preallocate memory for it.
Am I missing something or is it really the simplest solution? I'd rather not "re-invent the wheel", and use a standard solution instead, if it exists.
Any thoughts, workarounds or enlightening comments would be appreciated!
I think you are using wrong the container.
If you want fast push back then don't automatically assume that you need a linked list, a linked list is a slow container, it is basically suitable for reordering.
A better container is a std::deque. A deque is basically a array of arrays. It allocates a block of memory and occupies it when you push back, when it runs out it will allocate another block. This means that it only allocates very infrequently and you do not have to know the size of the container ahead of time for efficiency like std::vector and reserver.
You can use the splice function in std::list to implement a pool. Add a new member variable PtrList m_Pool. When you want to add a new object and the pool is not empty, assign the value to the first element in the pool and then splice it into the list. To erase an element, splice it from the list to the pool.
But if you don't care about the order of the elements, then a deque can be much faster. If you want to erase an element in the middle, copy the last element onto the element you want to delete, then erase the last element.
My advice is the same as 111111's, try switching to deque before you write any significant code.
However, to directly answer your question: you could use std::list with a custom allocator. It's a bit fiddly, and I'm not going to work through all the details here, but the gist of it is that you write a class that represents the memory allocation strategy for list nodes. The nodes allocated by list will be a small implementation-defined amount larger than char*, but they will all be the same size, which means you can write an optimized allocator just for that size (a pool of memory blocks rather than a pool of objects), and you can add functions to it that let you reserve whatever space you want in the allocator, at the time you want. Then the list can allocate/free quickly. This saves you needing to re-implement any of the actual list functionality.
If you were (for some reason) going to implement a pool of objects with list functionality, then you could start with boost::intrusive. That might also be useful when writing your own allocator, for keeping track of your list of free blocks.
List and vector are completely different in the way they manage objects.
Vector constructs elements in place into a allocated buffer of a given capacity. New allocation happens when the capacity is exhausted.
List allocate elements one by one, each into an individually allocated space.
Vector elements shift when something is inserted / removed, hence, vector indexes and element addresses are not stable.
List element are re-linked when something is inserted / removed, hence, list iterators and elements addresses are stable.
A way to make a list to behave similarly to a vector, is to replace the default allocator (that allocates form the system every time is invoked) with another one the allocates objects in larger chunks, dispatching sub-chunks to the list when it invokes it.
This is not something the standard library provides by default.
Could potentially use list::get_allocator().allocate(). Afaik, default behaviour would be to acquire memory as and when due to the non-contiguous nature of lists - hence the lack of reserve() - but no major drawbacks with using the allocator method occur to me immediately. Provided you have a non-critical section in your program, at the start or whatever, you can at least choose to take the damage at that point.

why not implement c++ std::vector::pop_front() by shifting the pointer to vector[0]?

Why can't pop_front() be implemented for C++ vectors by simply shifting the pointer contained in the vector's name one spot over? So in a vector containing an array foo, foo is a pointer to foo[0], so pop_front() would make the pointer foo = foo[1] and the bracket operator would just do the normal pointer math. Is this something to do with how C++ keeps track of the memory you're using for what when it allocates space for an array?
This is similar to other questions I've seen about why std::vector doesn't have a pop_front() function, I will admit, but i haven't anyone asking about why you can't shift the pointer.
The vector wouldn't be able to free its memory if it did this.
Generally, you want the overhead per vector object to be small. That means you only store three items: the pointer to the first element, the capacity, and the length.
In order to implement what you suggest, every vector ever (all of them) would need an additional member variable: the offset from the start pointer at which the zeroth element resides. Otherwise, the memory could not be freed, since the original handle to it would have been lost.
It's a tradeoff, but generally the memory consumption of an object which may have millions of instances is more valuable than the corner case of doing the absolute worst thing you can do performance-wise to the vector.
Because implementers want to optimize the size of a vector. They usually use 3 pointers, one for the beginning, one for the capacity (the allocated end) and one for the end.
Doing what you require adds another 4 bytes to every vector (and there are a lot of those in a c++ program) for very little benefit: the contract of vector is to be fast when pushing back new elements, removing and inserting are "unsual" operations and their performance matter less than the size of the class.
I started typing out an elaborate answer explaining how the memory is allocated and freed but after typing it all out I realized that memory issues alone don't justify why pop_front isn't there as other answers here suggested.
Having pop_front in a vector where the extra cost is another pointer is justifiable in most circumstances. The problem, in my opinion, is push_front. If the container has pop_front then it should also have push_front otherwise the container is not being consistent. push_front is definitely expensive for a vector container (unless you match your pushes with your pops which is not a good design). Without push_front the vector is really wasting memory if one does lots of pop_front operations with no push_front functionality.
Now the need for pop_front and push_front is there for a container that is similar to a vector (constant time random access) which is why there is deque.
You could, but it would complicate the implementation a bit, and add a pointer of overhead to the type's size (so it could track the actual allocation's address). Is that worth it? Sometimes. First consider other structures which may handle your usage better (maybe deque?).
You could do that, but vector is designed to be a simple container with constant time index lookups and push/pop from the end. Doing what you suggest would complicate the implementation as it would have to track the allocated beginning and the "current" beginning. Not to mention that you still couldn't guarantee constant time insertion at the front but you might get it sometimes.
If you need a container with constant time front and back insertion and removal, that's precisely what deque is for, there's no need to modify vector to handle it.
You can use std::deque instead of std::vector. It's a double-ended-queue with also the vector-like access members. It implements both front and back push/pop.
http://www.cplusplus.com/reference/stl/deque/
Another shortcoming of your suggestion is that you'll waste memory spaces as you can't make use of those on the left of the array after shifting. The more you execute pop_front(), the more you'll waste until the vector is destructed.

Does std::vector change its address? How to avoid

Since vector elements are stored contiguously, I guess it may not have the same address after some push_back's , because the initial allocated space could not suffice.
I'm working on a code where I need a reference to an element in a vector, like:
int main(){
vector<int> v;
v.push_back(1);
int *ptr = &v[0];
for(int i=2; i<100; i++)
v.push_back(i);
cout << *ptr << endl; //?
return 0;
}
But it's not necessarily true that ptr contains a reference to v[0], right? How would be a good way to guarantee it?
My first idea would be to use a vector of pointers and dynamic allocation. I'm wondering if there's an easier way to do that?
PS.: Actually I'm using a vector of a class instead of int, but I think the issues are the same.
Don't use reserve to postpone this dangling pointer bug - as someone who got this same problem, shrugged, reserved 1000, then a few months later spent ages trying to figure out some weird memory bug (the vector capacity exceeded 1000), I can tell you this is not a robust solution.
You want to avoid taking the address of elements in a vector if at all possible precisely because of the unpredictable nature of reallocations. If you have to, use iterators instead of raw addresses, since checked STL implementations will tell you when they have become invalid, instead of randomly crashing.
The best solution is to change your container:
You could use std::list - it does not invalidate existing iterators when adding elements, and only the iterator to an erased element is invalidated when erasing
If you're using C++0x, std::vector<std::unique_ptr<T>> is an interesting solution
Alternatively, using pointers and new/delete isn't too bad - just don't forget to delete pointers before erasing them. It's not hard to get right this way, but you have to be pretty careful to not cause a memory leak by forgetting a delete. (Mark Ransom also points out: this is not exception safe, the entire vector contents is leaked if an exception causes the vector to be destroyed.)
Note that boost's ptr_vector cannot be used safely with some of the STL algorithms, which may be a problem for you.
You can increase the capacity of the underlying array used by the vector by calling its reserve member function:
v.reserve(100);
So long as you do not put more than 100 elements into the vector, ptr will point to the first element.
How would be a good way to guarantee it?
std::vector<T> is guaranteed to be continous, but the implementation is free to reallocate or free storage on operations altering the vector contents (vector iterators, pointers or references to elements become undefined as well).
You can achieve your desired result, however, by calling reserve. IIRC, the standard guarantees that no reallocations are done until the size of the vector is larger than its reserved capacity.
Generally, I'd be careful with it (you can quickly get trapped…). Better don't rely on std::vector<>::reserve and iterator persistence unless you really have to.
If you don't need your values stored contiguously, you can use std::deque instead of std::vector. It doesn't reallocate, but holds elements in several chunks of memory.
Another possibility possibility would be a purpose-built smart pointer that, instead of storing an address would store the address of the vector itself along with the the index of the item you care about. It would then put those together and get the address of the element only when you dereference it, something like this:
template <class T>
class vec_ptr {
std::vector<T> &v;
size_t index;
public:
vec_ptr(std::vector<T> &v, size_t index) : v(v), index(index) {}
T &operator*() { return v[index]; }
};
Then your int *ptr=&v[0]; would be replaced with something like: vec_ptr<int> ptr(v,0);
A couple of points: first of all, if you rearrange the items in your vector between the time you create the "pointer" and the time you dereference it, it will no longer refer to the original element, but to whatever element happens to be at the specified position. Second, this does no range checking, so (for example) attempting to use the 100th item in a vector that only contains 50 items will give undefined behavior.
As James McNellis and Alexander Gessler stated, reserve is a good way of pre-allocating memory. However, for completeness' sake, I'd like to add that for the pointers to remain valid, all insertion/removal operations must occur from the tail of the vector, otherwise item shifting will again invalidate your pointers.
Depending on your requirements and use case, you might want to take a look at Boost's Pointer Container Library.
In your case you could use boost::ptr_vector<yourClass>.
I came across this problem too and spent a whole day just to realize vector's address changed and the saved addresses became invalid. For my problem, my solution was that
save raw data in the vector and get relative indices
after the vector stopped growing, convert the indices to pointer addresses
I found the following works
pointers[i]=indices[i]+(size_t)&vector[0];
pointers[i]=&vector[ (size_t)indices[i] ];
However, I haven't figured out how to use vector.front() and I am not sure whether I should use
pointers[i]=indices[i]*sizeof(vector)+(size_t)&vector[0] . I think the reference way(2) should be very safe.