What's wrong with this method of deletion in a vector? - c++

I have a deletion method that works and is as follows:
void deleteUserByID(int id, std::vector<Person*>& userList)
{
for(int i = 0; i < userList.size(); i++) {
if (userList.at(i)->getID() == id) {
userList.erase(userList.begin() + i);
}
}
}
However, I was trying the following before the above and couldn't understand why it wasn't working.
Instead of using userList.erase(userList.begin() + i);, I was using delete userList.at(i)
I'm somewhat new to C++, and have been instructed to delete heap allocated memory with the "delete" keyword. I felt that should have removed it from the Vector, but was wrong.
Why doesn't the delete userList.at(i) work? I'm curious. Any info would be helpful.

There are two separate concepts at play here. First, there's the maintenance of the std::vector that you're using. The vector's job is to hold a sequence of elements, and in many ways it doesn't really care what those elements actually are. From the vector's perspective, its elements will stick around until something explicitly comes along and says to get rid of them. The call to erase tells the vector "Hey, you know that element you've got at that one position? Please get rid of it." So when you make the call to erase, you're telling the vector to get rid of one of its elements.
Independently, there's the objects that are being stored in the vector. You're storing Person *s, which are pointers to Person objects. Those objects (I'm assuming) were allocated with new, so each Person essentially thinks "I'm going to live forever, or at least until someone comes around and calls delete on me." If you delete one of the Person objects, that object ceases to exist. However, the Person objects have absolutely no idea that there's a vector somewhere with pointers to people.
In order to get everything to work the way you want it to, you actually need to use a combination of both erase and delete (with a caveat that I'll mention later). If you just erase the pointers from the vector, then from the vector's perspective everything is cleaned up (it no longer holds pointers to the Person object in question), but from the Person's perspective the Person object is still very much alive and well because you never said to delete it. If you just delete the pointers, then from the Person's perspective everything is cleaned up (you've told the Person that it's time to go to the giant playground in the sky), but from the vector's perspective nothing was added or removed, so you now have a dangling pointer in your vector. In other words, the first option results in a memory leak - there's a Person object that was never told to clean itsefl up - and the second option results in dangling pointer - there's a pointer to what used to be a person, but which is now a bunch of bits that can be recycled however the program wishes.
Using the setup you have right now, the "best" way to handle this would be to use a combined approach. When you find an item to remove, first delete the pointer, then call erase. That ensures that the Person gets cleaned up and that the vector no longer has a dangling pointer in it.
But as some of the commenters have noted, there's a much better way to do this. Rather than storing Person *s and using raw pointers to reference the Person objects, use the std::shared_ptr type and manage your Person objects through std::shared_ptr<Person>. Unlike a regular pointer, which just says "yeah, there's a thing over there" and won't do any memory management on its own, the std::shared_ptr type actually owns the resource that it points at. If you erase a std::shared_ptr from a vector, the std::shared_ptr then says "okay, I just got kicked out of the vector, and if I'm the last pointer to the Person, I'll go and delete it for you." That means that you don't need to do any of your own memory management to clean things up.
In summary:
Just calling erase gets rid of an element from the vector, but leaves a Person adrift in the heap, wondering why no one loves it anymore.
Just calling delete sets the Person object free, but leaves a ghostly pointer to it in the vector that's a major hazard.
Calling both delete and erase in the proper order will solve this problem, but isn't the ideal solution.
Using std::shared_ptr instead of raw pointers is probably the best option, since it ensures that all the right deletes happen automatically.
Hope this helps!
And a quick addendum - are you sure that you code correctly visits all the elements of the vector? For example, if you erase the item at index 0, all the other elements of the vector will shift back one position. But then your implementation increments i to 1, at which point you've skipped over the item that just got shifted back to the first position.
I'll let you think about how to resolve this. Another answer has offered a good suggestion of using remove_if, which is one good solution, though if for your own edification you want to roll your own version, you might want to think over how you'd address the above issue.

This is one of those places a picture is almost certainly worth at least a thousand words. The vector is storing pointers, which point to (presumably) dynamically allocated objects, something like this:
So, the green boxes represent the elements in the vector itself. The blue boxes represent your data objects. I've separated the third one to signify the fact that it's the one we're going to (eventually) remove.
As it stands right now, your code is deleting some of the green boxes. It leaves the blue box (your data) in memory, but you no longer have a pointer to it:
At this point, you're right that the data no longer appears in the vector, so your routine has "worked" to that extent. The problem is that you no longer have access to that data, so you've leaked its memory.
What's (apparently) being suggested is that when you find the object you want to remove from the list, you should first use delete to destroy the data (the blue box):
...then use erase to remove that element from the vector:
Alternatives
I would not use a std::shared_ptr for a case like this. A shared_ptr is intended to manage objects that have shared ownership, and nothing you've said indicates that you're dealing with shared ownership. If you must use dynamically allocated objects, and don't want to manage things manually (which I agree is a good thing to avoid), you might consider using std::unique_ptr, or you might want to consider using a Boost ptr_vector instead.
Alternatively, consider changing it to a std::vector<Person> (i.e., store the objects directly in the vector instead of storing pointers to dynamically allocated objects). At least in my experience, this is really the right answer the vast majority of the time. If you really need to ensure against moving the Person objects around when the vector resizes, consider using an std::deque<Person> instead. A std::deque<Person> is fairly close to what you've created, but with at least some potential for the compiler to optimize allocation by putting a number of data objects (Persons, in your case) into a single block of memory, instead of allocating each one individually.
Conclusion
Until or unless evidence to the contrary is found, the right answer is most likely std::vector<Person> with std::deque<Person> in second place. Direct dynamic allocation of the Person objects, with something to automate their deletion runs a distant third place (at best).

The other answers given summarize what you really should do in terms of design, and that is to use smart pointers.
However, if you really did use raw pointers, and allocated those entries with new, the way you can delete and erase without writing any loops is to
Partition the elements to delete
delete the elements
Erase the partitioned elements from the vector using vector<T>::erase.
Here is an example:
void deleteUserByID(int id, std::vector<Person*>& userList)
{
// partition the about-to-be deleted elements to the right of the partition
// and all good items to the left of the partition
auto iter = std::partition(userList.begin(), userList.end(), [&](Person *p)
{ return p->getID() != id; });
// issue a delete on those elements on right of partition
std::for_each(iter, userList.end(), [](Person *p) { delete p; });
// now erase those elements from the vector.
userList.erase(iter, userList.end());
}
The std::partition simply places all elements you wish to delete on the right of the partition (which is returned by iter). Then it's just a matter of calling delete on those elements on the right of the partition, and finally erase those elements.
The reason why this 3-step process was done instead of directly using the std::remove_if is that std::remove_if gives you undetermined elements in the range denoting the items that were "removed", thus issuing subsequent delete calls on those elements would have resulted in undefined behavior.
For example, this code, even though it looks like it would work, actually results in undefined behavior:
void deleteUserByID(int id, std::vector<Person*>& userList)
{
// move items to be removed to the end of the vector
auto iter = std::remove_if(userList.begin(), userList.end(), [&](Person *p)
{ return p->getID() == id; });
// issue a delete on those elements (this actually invokes undefined behavior)
std::for_each(iter, userList.end(), [](Person *p) { delete p; });
// now erase those elements from the vector (if your program even gets this far)
userList.erase(iter, userList.end());
}
Basically, you can't do anything "special" to the items in the removed range (for example, call delete), as those items are indeterminate garbage. The only thing you can safely do is to erase them.
So the trick is to partition the elements (which doesn't invalidate those items), delete the partitioned elements, and then remove them using erase.
*Note that if you want to keep the order of the elements that will not be deleted, then use std::stable_partition instead of std::partition.

Proper way to do it is to use smart pointers and an algorithm from STL.
void deleteUserByID(int id, std::vector<std::unique_ptr<Person>>& userList)
{
auto endIt = std::remove_if(userList.begin(), userList.end(),
[id](const auto &person) {
return person->getID() == id;
});
userList.erase(endIt, userList.end());
}

These are two different and complementary things. For your vector
userList.erase(userList.begin() + i);
will remove the ith pointer from your vector, but will not affect the pointed at Person object in any way
delete userList.at(i);
will delete (free) the Person object pointed at by the ith pointer in your vector, but will not affect the vector in any way.
Depending on where these Person objects are coming from and what you are trying to do, you might need to do both.

Related

what happens to a linked list if it gets reallocated

I understand that vectors get moved if you push more elements than it has capacity for, but what happens to std::list if one of it's elements get's moved for reasons unrelated to the list itself? For instance to make space for a vector?
Will the list get invalidated because the elements around the moved element no longer point to it? Or is the list prepared for such eventuality?
If it is the later, what happens to pointers that point to the moved element?
For case application, i want to make a node map, which of course means that every node has to point to other nodes. But I also need to have a list of the nodes so I can search them easily.
So i wanted to have a list where every object of the list will have pointers to some other elements of the same list (this is outside of the normal std::list back and forth pointers). But I got worried about how would std::list handle one of its elements getting moved, and how could i handle my own pointers in such eventuality.
I discarded vectors already because the documentation already states that if it gets moved all pointers and references to its elements will get invalidated. If my approach of using std::list can not work, my 2nd best would be to keep the list of nodes into a vector and make the nodes reference eachother through index number (which i can do because once built the vector won't change it's size)
but what happens to std::list if one of it's elements get's moved for
reasons unrelated to the list itself?
In C++, the runtime environment is not allowed to unilaterally move objects in that way, for exactly the reason you imagine -- there is no reliable or efficient way to find all pointers to the object's old location and modify them to point to a new location, so any attempt to surreptitiously move an object would run the risk of creating dangling-pointers which would lead to undefined behavior. (btw this is one reason why garbage collectors don't work very well in C++)
So having your list-nodes objects moved behind your back is not something you need to worry about. Since your list-container has sole ownership of the nodes, the only way for this to happen would be if the list itself decided to explicitly move one of its nodes for some reason (and in general there's no general reason why it would need or want to do that).

Shared pointer with observer pointers

I have a class Foo, which has a vector of some large classes. The idea is, that an octal tree will be built recursively out of the elements of the vector, and each OctreeNode will have a pointer to few elements of the vector found in Foo. (In the example, just for simplicity, a node will point to only one element of the vector)
class Foo
{
vector<LargeClass> mLargeClasses;
void removeItem(const int index); //remove an element from the vector at the index
}
class OctreeNode
{
LargeClass* mLargeClass;
}
One can say, "why bother keeping the vector after the tree is built, and store the objects in the tree itself". True, let's just say, I need to keep vector parallel to the built tree as well.
While the above concept works, I have issues when elements got removed from the underlying vector. In such case, some Octree nodes end up with dangling pointers.
My solution #1:
If removeItem function is called, then before it removes the vector element, it first recursively traverse the octal tree, and make all mLargeClass pointer a nullptr which happen to point to that particular vector element. It's ok to have nullptr in the nodes, as I check each time against nullptr, when I access them anyway.
My solution #2:
Have the vector store shared_ptrs, and have the OctreeNode store a weak_ptr. I am not fan of this, as each time I access a weak_ptr in the tree, it gets converted to a shared_ptr in the background with all the atomic counter increases. I am not expert on performance testing, but I have a feeling, that it is slower than a simple pointer access with if condition.
Does anybody know any better solutions?
I think the most elegant would be:
To have smart pointer which behaves like a shared_pointer, counts, how many other pointer refers to it, keep a record of them, and in case it gets destroyed, it automatically nulls out all other "observer" pointers which refer to it?
While the field, and the purpose is somewhat different, i think i will give a try for the handle system described in this port:
Simple, efficient weak pointer that is set to NULL when target memory is deallocated
If I fail, I will revert back to the shared_ptr/weak_ptr duo. Described in the same post.

Vectors, when to re-reference elements

So far I have only worked with lists in C++ (Queues, stacks, tree etc.. in Java). I have done some reading and have endeavoured to learn about Vectors as they are good for traversal compared to lists and don't have the complexity of Arrays in regards to house keeping.
So far I am aware that there can be an issue in regards to pointer validation in the event the Vector needs to be reallocated. The pickle being (as far as I know) no real way to determine if the adding of an element to the Vector will trigger reallocation.
One answer I can think of is to re-assign the pointers to each element every time an element is added or removed.
This seems like a decent amount of overhead on the chance reallocation is done. Is there a better way perhaps?
One way to approach this is just to have a vector of pointers (preferably smart pointers, so you don't need to worry about manual deallocation).
E.g. instead of
std::vector<MyObject> vec;
vec.push_back(MyObject());
MyObject* ptr = &vec[0];
Do something like this:
std::vector<std::unique_ptr<MyObject>>
vec.push_back(std::unique_ptr<MyObject>(new MyObject())); // *
MyObject* ptr = vec[0].get();
(* or use vec.push_back(std::make_unique<MyObject>()) in C++14)
If done in the second way, ptr will always be valid across internal reallocations of the vector, because it's not pointing to memory that is managed by the vector, it is pointing to memory that is managed by the unique_ptr, which will not change until the object is explicitly released or destroyed.

Using delete [] on an object in an array

How would I use delete on a specific class object in an array to ONLY delete that object and not the whole array?
This object will be no matter what be at the end of the array, so moving the other objects in the array doesn't matter also
EDIT:
To make it more clearer, I just want to free up the last element of the array for later use.
You should use std::vector instead of a regular array if you need to do this. With std::vector, removing and destroying the final element is done via the pop_back method.
If you absolutely must use a regular array, and must destroy one object inside the array right now, you can placement-destroy that object and then construct a new one in its place. If you aren't sure what this means or how to do it, you probably should be using std::vector instead.
If you want to copy the last element of the array to somewhere else, simply do that and then ignore the last element. You cannot remove it. Usually you have a counter variable somewhere that holds the amount of elements currently in the array. So when you "remove" the last element, simply decrement that counter by 1.
FYI, std::vector works exactly the same way. (And you should be using std::vector to begin with anyway.)
An array allocated with the new[] directive can only be released with the delete[] directive. Any other attempt to release part of or the entire array (free, delete etc.) is undefined behaviour.
If you want to be able to free a specific element in the array then you are better off looking into using an stl container such as std::vector, std::list, std::set, std::map etc. Each has different properties and is appropriate for a different task, but all support fast deletion of elements (except vector which only supports "fast" delteion of the last element but since that is exactly what you wanted then its a good option) and in fact "hide" the allocation of elements for you.
If the array is like
object* array = new object[n];
Then you cannot delete an element of it. You can move the elements by using memcpy(), but not change the size of the array.
Using new or delete on an array item will result in heap corruption in the majority of cases. As suggested, you should use std::vector.
If the array is an array of pointers to objects, then you can delete a pointed object but you cannot delete the pointer itself (see first part of this answer).
EDIT: a comment from the author of the question specify that he wants to remove the last element of a pointer array.
You can set the last pointer to 0 and check:
pointers[size - 1] = 0;
// later, when using the array
if(pointers[i]) // do smth
Using std::vector will be more easy. Also, if you deletions are frequent enough, you should consider using std::list.

Does std::vector change its address? How to avoid

Since vector elements are stored contiguously, I guess it may not have the same address after some push_back's , because the initial allocated space could not suffice.
I'm working on a code where I need a reference to an element in a vector, like:
int main(){
vector<int> v;
v.push_back(1);
int *ptr = &v[0];
for(int i=2; i<100; i++)
v.push_back(i);
cout << *ptr << endl; //?
return 0;
}
But it's not necessarily true that ptr contains a reference to v[0], right? How would be a good way to guarantee it?
My first idea would be to use a vector of pointers and dynamic allocation. I'm wondering if there's an easier way to do that?
PS.: Actually I'm using a vector of a class instead of int, but I think the issues are the same.
Don't use reserve to postpone this dangling pointer bug - as someone who got this same problem, shrugged, reserved 1000, then a few months later spent ages trying to figure out some weird memory bug (the vector capacity exceeded 1000), I can tell you this is not a robust solution.
You want to avoid taking the address of elements in a vector if at all possible precisely because of the unpredictable nature of reallocations. If you have to, use iterators instead of raw addresses, since checked STL implementations will tell you when they have become invalid, instead of randomly crashing.
The best solution is to change your container:
You could use std::list - it does not invalidate existing iterators when adding elements, and only the iterator to an erased element is invalidated when erasing
If you're using C++0x, std::vector<std::unique_ptr<T>> is an interesting solution
Alternatively, using pointers and new/delete isn't too bad - just don't forget to delete pointers before erasing them. It's not hard to get right this way, but you have to be pretty careful to not cause a memory leak by forgetting a delete. (Mark Ransom also points out: this is not exception safe, the entire vector contents is leaked if an exception causes the vector to be destroyed.)
Note that boost's ptr_vector cannot be used safely with some of the STL algorithms, which may be a problem for you.
You can increase the capacity of the underlying array used by the vector by calling its reserve member function:
v.reserve(100);
So long as you do not put more than 100 elements into the vector, ptr will point to the first element.
How would be a good way to guarantee it?
std::vector<T> is guaranteed to be continous, but the implementation is free to reallocate or free storage on operations altering the vector contents (vector iterators, pointers or references to elements become undefined as well).
You can achieve your desired result, however, by calling reserve. IIRC, the standard guarantees that no reallocations are done until the size of the vector is larger than its reserved capacity.
Generally, I'd be careful with it (you can quickly get trapped…). Better don't rely on std::vector<>::reserve and iterator persistence unless you really have to.
If you don't need your values stored contiguously, you can use std::deque instead of std::vector. It doesn't reallocate, but holds elements in several chunks of memory.
Another possibility possibility would be a purpose-built smart pointer that, instead of storing an address would store the address of the vector itself along with the the index of the item you care about. It would then put those together and get the address of the element only when you dereference it, something like this:
template <class T>
class vec_ptr {
std::vector<T> &v;
size_t index;
public:
vec_ptr(std::vector<T> &v, size_t index) : v(v), index(index) {}
T &operator*() { return v[index]; }
};
Then your int *ptr=&v[0]; would be replaced with something like: vec_ptr<int> ptr(v,0);
A couple of points: first of all, if you rearrange the items in your vector between the time you create the "pointer" and the time you dereference it, it will no longer refer to the original element, but to whatever element happens to be at the specified position. Second, this does no range checking, so (for example) attempting to use the 100th item in a vector that only contains 50 items will give undefined behavior.
As James McNellis and Alexander Gessler stated, reserve is a good way of pre-allocating memory. However, for completeness' sake, I'd like to add that for the pointers to remain valid, all insertion/removal operations must occur from the tail of the vector, otherwise item shifting will again invalidate your pointers.
Depending on your requirements and use case, you might want to take a look at Boost's Pointer Container Library.
In your case you could use boost::ptr_vector<yourClass>.
I came across this problem too and spent a whole day just to realize vector's address changed and the saved addresses became invalid. For my problem, my solution was that
save raw data in the vector and get relative indices
after the vector stopped growing, convert the indices to pointer addresses
I found the following works
pointers[i]=indices[i]+(size_t)&vector[0];
pointers[i]=&vector[ (size_t)indices[i] ];
However, I haven't figured out how to use vector.front() and I am not sure whether I should use
pointers[i]=indices[i]*sizeof(vector)+(size_t)&vector[0] . I think the reference way(2) should be very safe.