C++ - Deleting a vector element that is referenced by a pointer - c++

Well, I don't know if it is possible, but the thing would be:
struct stPiece
{
/* some stuff */
stPiece *mother; // pointer to the piece that created this one
};
vector<stPiece> pieces;
Is it possible to erase the piece referenced by 'mother' from pieces, having just that pointer as a reference? How?
Would it mess with the other references? (i.e. if it is not the last element in the vector, by shifting the next elements to other memory positions, while the other '*mothers' remain constant). Of course, I assuming that all the child pieces will be deleted (so I won't need to update any pointer that goes to the same mother).
Thanks!

If your mother pointers point directly to elements of the pieces vector you will get in all kinds of trouble.
Deleting an element from pieces will shift all the positions of the elements at higher indexes. Even inserting elements can make all the pointers invalid, since the vector might need to reallocate it's internal array which might transfer all the elements to new positions in memory.
To answer your main question: You can't delete the element you have the pointer to directly, you would first need search through the vector to find it, or calculate it's index in the vector.
Not storing pointers into pieces as mother but instead the indexes of the elements would make it a bit more robust, so that at least inserting new elements could not break the existing mothers. But deleting from pieces would still shift elements to new indexes.
Using a std::list for pieces and storing iterators into that as mother might be a solution. Iterators of std::list are not invalidated if other elements are of that list are removed/added. If different elements can have the same mother you still have a problem finding out when to remove the mother elements, than maybe using boost::shared_ptr would be simpler.

It is not exactly clear how the entire data structure is organized and what the consequences are going to be, but it is perfectly possible to erase an element from the vector by having a pointer to that element and the vector itself. You just need to convert the pointer to an iterator first. For example, having a vector
vector<stPiece> pieces;
and a pointer into that vector
stPiece *mother;
you can convert the pointer to an index
vector<stPiece>::size_type i = mother - &pieces[0];
assert(i < pieces.size());
then convert the index to an iterator
vector<stPiece>::iterator it = pieces.begin() + i;
then erase the element
pieces.erase(it);
and that's it.
However, it appears that in your data structure you might have multiple long-lived pointers pointing into the same vector. Any attempts to erase elements from such vector will immediately invalidate all these pointers. It theoretically is possible to "restore" their validity, if you do everything carefully, but this is going to a major PITA.
I'm not sure I understand what you mean by "assuming that all the child pieces will be deleted".

Yes, you can erase the piece referenced by mother.
If you delete the piece referenced by 'mother', the mother pointer in all its children will become dangling, you'll have to take care of this.
About the shifting of elements in the vector, you need not do it, its taken care by the vector class.

Short answer: no.
The pieces are stored in the vector by value. A vector iterator is therefore a pointer to a piece. This means, then, that a pointer to the mother piece is the same as the vector's iterator at the mother. Vector iterators are invalidated on insertion (all iterators) and erasure (all iterators past the erased iterator), which means memory locations will change and it will be nearly impossible to keep all the pointers updated.
You could store dynamically allocated pieces in the vector, i.e.:
vector<stPiece*> pieces
The mother pointers won't change as pieces are added/removed to/from the vector. The downsides are:
you now have to manage memory (new/delete each piece)
it uses more memory per piece (the pointers in pieces)
it may be slower because you lose spatial locality (cache efficiency) because it is no longer a contiguous array of stPiece objects
The latter two points may or may not be important in your application.

What you have coded is a singly-linked tree. You probably don't want an object to contain all your stPieces, because that would get in the way of implementing creation and deletion semantics.
I'm guessing that you want to delete mother after all the children are gone.
set< stPiece * > all_pieces;
struct stPiece {
boost::shared_ptr< stPiece > const mother;
stPiece( boost::shared_ptr< stPiece > &in_mother )
: mother( in_mother ) {
all_pieces.insert( this );
}
~stPiece() {
all_pieces.erase( this );
}
};
The key point is that there's a difference between containing some objects and merely being able to iterate over them. If using the most obvious way to create and delete the objects isn't using the container, they probably shouldn't be in it.

Related

What's wrong with this method of deletion in a vector?

I have a deletion method that works and is as follows:
void deleteUserByID(int id, std::vector<Person*>& userList)
{
for(int i = 0; i < userList.size(); i++) {
if (userList.at(i)->getID() == id) {
userList.erase(userList.begin() + i);
}
}
}
However, I was trying the following before the above and couldn't understand why it wasn't working.
Instead of using userList.erase(userList.begin() + i);, I was using delete userList.at(i)
I'm somewhat new to C++, and have been instructed to delete heap allocated memory with the "delete" keyword. I felt that should have removed it from the Vector, but was wrong.
Why doesn't the delete userList.at(i) work? I'm curious. Any info would be helpful.
There are two separate concepts at play here. First, there's the maintenance of the std::vector that you're using. The vector's job is to hold a sequence of elements, and in many ways it doesn't really care what those elements actually are. From the vector's perspective, its elements will stick around until something explicitly comes along and says to get rid of them. The call to erase tells the vector "Hey, you know that element you've got at that one position? Please get rid of it." So when you make the call to erase, you're telling the vector to get rid of one of its elements.
Independently, there's the objects that are being stored in the vector. You're storing Person *s, which are pointers to Person objects. Those objects (I'm assuming) were allocated with new, so each Person essentially thinks "I'm going to live forever, or at least until someone comes around and calls delete on me." If you delete one of the Person objects, that object ceases to exist. However, the Person objects have absolutely no idea that there's a vector somewhere with pointers to people.
In order to get everything to work the way you want it to, you actually need to use a combination of both erase and delete (with a caveat that I'll mention later). If you just erase the pointers from the vector, then from the vector's perspective everything is cleaned up (it no longer holds pointers to the Person object in question), but from the Person's perspective the Person object is still very much alive and well because you never said to delete it. If you just delete the pointers, then from the Person's perspective everything is cleaned up (you've told the Person that it's time to go to the giant playground in the sky), but from the vector's perspective nothing was added or removed, so you now have a dangling pointer in your vector. In other words, the first option results in a memory leak - there's a Person object that was never told to clean itsefl up - and the second option results in dangling pointer - there's a pointer to what used to be a person, but which is now a bunch of bits that can be recycled however the program wishes.
Using the setup you have right now, the "best" way to handle this would be to use a combined approach. When you find an item to remove, first delete the pointer, then call erase. That ensures that the Person gets cleaned up and that the vector no longer has a dangling pointer in it.
But as some of the commenters have noted, there's a much better way to do this. Rather than storing Person *s and using raw pointers to reference the Person objects, use the std::shared_ptr type and manage your Person objects through std::shared_ptr<Person>. Unlike a regular pointer, which just says "yeah, there's a thing over there" and won't do any memory management on its own, the std::shared_ptr type actually owns the resource that it points at. If you erase a std::shared_ptr from a vector, the std::shared_ptr then says "okay, I just got kicked out of the vector, and if I'm the last pointer to the Person, I'll go and delete it for you." That means that you don't need to do any of your own memory management to clean things up.
In summary:
Just calling erase gets rid of an element from the vector, but leaves a Person adrift in the heap, wondering why no one loves it anymore.
Just calling delete sets the Person object free, but leaves a ghostly pointer to it in the vector that's a major hazard.
Calling both delete and erase in the proper order will solve this problem, but isn't the ideal solution.
Using std::shared_ptr instead of raw pointers is probably the best option, since it ensures that all the right deletes happen automatically.
Hope this helps!
And a quick addendum - are you sure that you code correctly visits all the elements of the vector? For example, if you erase the item at index 0, all the other elements of the vector will shift back one position. But then your implementation increments i to 1, at which point you've skipped over the item that just got shifted back to the first position.
I'll let you think about how to resolve this. Another answer has offered a good suggestion of using remove_if, which is one good solution, though if for your own edification you want to roll your own version, you might want to think over how you'd address the above issue.
This is one of those places a picture is almost certainly worth at least a thousand words. The vector is storing pointers, which point to (presumably) dynamically allocated objects, something like this:
So, the green boxes represent the elements in the vector itself. The blue boxes represent your data objects. I've separated the third one to signify the fact that it's the one we're going to (eventually) remove.
As it stands right now, your code is deleting some of the green boxes. It leaves the blue box (your data) in memory, but you no longer have a pointer to it:
At this point, you're right that the data no longer appears in the vector, so your routine has "worked" to that extent. The problem is that you no longer have access to that data, so you've leaked its memory.
What's (apparently) being suggested is that when you find the object you want to remove from the list, you should first use delete to destroy the data (the blue box):
...then use erase to remove that element from the vector:
Alternatives
I would not use a std::shared_ptr for a case like this. A shared_ptr is intended to manage objects that have shared ownership, and nothing you've said indicates that you're dealing with shared ownership. If you must use dynamically allocated objects, and don't want to manage things manually (which I agree is a good thing to avoid), you might consider using std::unique_ptr, or you might want to consider using a Boost ptr_vector instead.
Alternatively, consider changing it to a std::vector<Person> (i.e., store the objects directly in the vector instead of storing pointers to dynamically allocated objects). At least in my experience, this is really the right answer the vast majority of the time. If you really need to ensure against moving the Person objects around when the vector resizes, consider using an std::deque<Person> instead. A std::deque<Person> is fairly close to what you've created, but with at least some potential for the compiler to optimize allocation by putting a number of data objects (Persons, in your case) into a single block of memory, instead of allocating each one individually.
Conclusion
Until or unless evidence to the contrary is found, the right answer is most likely std::vector<Person> with std::deque<Person> in second place. Direct dynamic allocation of the Person objects, with something to automate their deletion runs a distant third place (at best).
The other answers given summarize what you really should do in terms of design, and that is to use smart pointers.
However, if you really did use raw pointers, and allocated those entries with new, the way you can delete and erase without writing any loops is to
Partition the elements to delete
delete the elements
Erase the partitioned elements from the vector using vector<T>::erase.
Here is an example:
void deleteUserByID(int id, std::vector<Person*>& userList)
{
// partition the about-to-be deleted elements to the right of the partition
// and all good items to the left of the partition
auto iter = std::partition(userList.begin(), userList.end(), [&](Person *p)
{ return p->getID() != id; });
// issue a delete on those elements on right of partition
std::for_each(iter, userList.end(), [](Person *p) { delete p; });
// now erase those elements from the vector.
userList.erase(iter, userList.end());
}
The std::partition simply places all elements you wish to delete on the right of the partition (which is returned by iter). Then it's just a matter of calling delete on those elements on the right of the partition, and finally erase those elements.
The reason why this 3-step process was done instead of directly using the std::remove_if is that std::remove_if gives you undetermined elements in the range denoting the items that were "removed", thus issuing subsequent delete calls on those elements would have resulted in undefined behavior.
For example, this code, even though it looks like it would work, actually results in undefined behavior:
void deleteUserByID(int id, std::vector<Person*>& userList)
{
// move items to be removed to the end of the vector
auto iter = std::remove_if(userList.begin(), userList.end(), [&](Person *p)
{ return p->getID() == id; });
// issue a delete on those elements (this actually invokes undefined behavior)
std::for_each(iter, userList.end(), [](Person *p) { delete p; });
// now erase those elements from the vector (if your program even gets this far)
userList.erase(iter, userList.end());
}
Basically, you can't do anything "special" to the items in the removed range (for example, call delete), as those items are indeterminate garbage. The only thing you can safely do is to erase them.
So the trick is to partition the elements (which doesn't invalidate those items), delete the partitioned elements, and then remove them using erase.
*Note that if you want to keep the order of the elements that will not be deleted, then use std::stable_partition instead of std::partition.
Proper way to do it is to use smart pointers and an algorithm from STL.
void deleteUserByID(int id, std::vector<std::unique_ptr<Person>>& userList)
{
auto endIt = std::remove_if(userList.begin(), userList.end(),
[id](const auto &person) {
return person->getID() == id;
});
userList.erase(endIt, userList.end());
}
These are two different and complementary things. For your vector
userList.erase(userList.begin() + i);
will remove the ith pointer from your vector, but will not affect the pointed at Person object in any way
delete userList.at(i);
will delete (free) the Person object pointed at by the ith pointer in your vector, but will not affect the vector in any way.
Depending on where these Person objects are coming from and what you are trying to do, you might need to do both.

Shared pointer with observer pointers

I have a class Foo, which has a vector of some large classes. The idea is, that an octal tree will be built recursively out of the elements of the vector, and each OctreeNode will have a pointer to few elements of the vector found in Foo. (In the example, just for simplicity, a node will point to only one element of the vector)
class Foo
{
vector<LargeClass> mLargeClasses;
void removeItem(const int index); //remove an element from the vector at the index
}
class OctreeNode
{
LargeClass* mLargeClass;
}
One can say, "why bother keeping the vector after the tree is built, and store the objects in the tree itself". True, let's just say, I need to keep vector parallel to the built tree as well.
While the above concept works, I have issues when elements got removed from the underlying vector. In such case, some Octree nodes end up with dangling pointers.
My solution #1:
If removeItem function is called, then before it removes the vector element, it first recursively traverse the octal tree, and make all mLargeClass pointer a nullptr which happen to point to that particular vector element. It's ok to have nullptr in the nodes, as I check each time against nullptr, when I access them anyway.
My solution #2:
Have the vector store shared_ptrs, and have the OctreeNode store a weak_ptr. I am not fan of this, as each time I access a weak_ptr in the tree, it gets converted to a shared_ptr in the background with all the atomic counter increases. I am not expert on performance testing, but I have a feeling, that it is slower than a simple pointer access with if condition.
Does anybody know any better solutions?
I think the most elegant would be:
To have smart pointer which behaves like a shared_pointer, counts, how many other pointer refers to it, keep a record of them, and in case it gets destroyed, it automatically nulls out all other "observer" pointers which refer to it?
While the field, and the purpose is somewhat different, i think i will give a try for the handle system described in this port:
Simple, efficient weak pointer that is set to NULL when target memory is deallocated
If I fail, I will revert back to the shared_ptr/weak_ptr duo. Described in the same post.

How can references be valid while iterators become invalidated in a deque

I am having some difficulty grasping this concept. From this thread here it states
A deque requires that any insertion to the front or back shall keep
any reference to a member element valid. It's OK for iterators to be
invalidated, but the members themselves must stay in the same place in
memory.
I was under the impression from this thread which states
A pointer is actually a type of iterator. In fact, for some container types, the corresponding iterator can be
implemented simply as a pointer.
If we have a pointer and an iterator that each reference the same
element of a container, then any operation that invalidates one will
invalidate the other.
so if an iterator becomes invalidated then references also become invalidated.
My question is how is that possible. If the iterator which points to a certain memory address becomes invalidated how can a reference to that address be valid ?
Update:
I understand that a deque is implemented by random chunks of memory and these chunks of memory are tracked by an independant data structure such as a dynamic array. However i am having difficulty understanding how an iterator could be invalid but a reference could be valid since essentially an iterator is a generalized pointer for the contents of the data structure. This makes me think that an iterator might be pointing to something else while a pointer points to the actual item ? Consider the following diagram of a vector .
From what i understand in the diagram above for a vector its that if content of a pointer changes the iterator also changes. How is that different for a deque .
Think of a deque in terms of the following:
template<typename T>
struct deque_stub {
using Page = std::array<T, 32>; // Note: Not really, rather uninitialised memory of some size;
std::vector<std::unique_ptr<Page>> pointers_to_pages;
std::size_t end_insert{32};
std::size_t start_elem{0};
// read further
};
A deque is basically some container, storing pointers to pages which contain some elements. (The start_elem and end_insert members are to keep track of where, in terms of offset into a page, the valid range of elements starts and ends.)
Insertion eventually changes this container, when a new page is needed:
template<typename X>
void push_back(X&& element) {
if (end_insert == 32) {
// get a new page at the end
pointers_to_pages.push_back(make_unique<Page>());
end_insert = 0;
}
(*(pointers_to_pages.back()))[end_insert] = std::forward<X>(element);
++end_insert;
}
template<typename X>
void push_front(X&& element) {
if (start_elem == 0) {
pointers_to_pages.insert(
pointers_to_pages.begin(), std::make_unique<Page>());
start_elem = 32;
}
--start_elem;
(*(pointers_to_pages.front()))[start_elem] = std::forward<X>(element);
}
An iterator into that deque needs to be able to "jump" across pages. The easiest way to achieve this is by having it keep an iterator to the current page it is in from the container pointers_to_pages:
struct iterator {
std::size_t pos;
std::vector<std::unique_ptr<Page>>::iterator page;
// other members to detect page boundaries etc.
};
But since that page iterator, the iterator into the vector, may get invalidated when the vector gets changed (which happens when a new page is needed), the whole iterator into the deque might get invalidated upon insertion of elements. (This could be "fixed" by not using a vector as container for the pointers, though this would probably have other negative side effects.)
As an example, consider a deque with a single, but full page. The vector holding the pointers to pages thus holds only a single element, let's say at address 0x10, and let's further assume that its current capacity is also only 1 element. The page itself is stored at some address, let's say 0x100.
Thus the first element of the deque is actually stored at 0x100, but using the iterator into the deque means first looking at 0x10 for the address of the page.
Now if we add another element at the end, we need a new page to store that. So we allocate one, and store the pointer to that new page into the vector. Since its capacity is less than the new size (1 < 2), it needs to allocate a new larger area of memory and move its current contents there. Let's say, that new area is at 0x20. The memory where the pointers have been stored previously (0x10) is freed.
Now the very same element from above before the insertion is still at the same address (0x100), but an iterator to it would go via 0x20. The iterator from above, accessing 0x10, is thus invalid.
Since the element is at the same address, pointers and references to it remain valid, tough.
Because the answer you cite is wrong, and because iterators are a lot more than just pointers. For a start, a linked list iterator needs a pointer to the element but also "next" and "previous" pointers. Right there, with that simple example, your notion that "an iterator is a generalized pointer for the contents of the data structure" is completely blown out of the water.
A deque is more complicated than a totally contiguous structure (e.g. vector) and more complicated than a totally non-contiguous structure (i.e. list). When a deque grows, its overall structure moulds to fit, with a minimum of reallocations of the actual elements (often, none).
The result is that even when certain elements don't move, the "control pieces" that allow access to them may need to be updated with fresh metadata about, for example, where neighbouring elements (which maybe did move) now are.
Now, a deque cannot magically update iterators that have already been instantiated somewhere: all it can do is document that your old iterators are invalid and that you shall obtain new ones in the usual way.

Is there a better way to remove the raw pointer reference to elements stored in vector

class LargeClass
{}
void FunctionA(const LargeClass&) {}
std::vector<LargeClass> vecLargeClass; // populate vecLargeClass
const LargeClass* prev = vecLargeClass[0];
for( ... )
{
...
if(...)
prev = &vecLargeClass[i];
}
I need to keep a reference to an element stored inside a vector.
In order to avoid copy, I currently use a raw pointer. Or I can store an index pointing to the element.
Is there a better solution for this?
Yes, you can keep a "reference" to an element in a vector so long as that vector's iterators aren't invalidated. That is a big caveat.
A vector's iterators become invalidated when the vector is reallocated, which can happen any time you add elements to the vector. Additionally when you erase an item from a vector, all the iterators at and beyond the point of removal are invalidated.
This is all very complicated, and better not worried about. If you need iterators to never become invalidated (so long as you don't remove that item itself), a vector might not be the best collection for your use. Instead, you might consider a list, a map, or other collections. Note that each has its own set of tradeoffs.
You might not need to care about the iterators at all, however. If your vector stored not items themselves, but pointers to the items, then even if the vector is reallocated the things the pointers point to will not move. Going this route, of course you should use a smart pointer if possible. On the face of it, the best one would appear to be shared_ptr. So your delcaration becomes:
std::vector<shared_ptr<LargeClass>>
Finally, if you really need to use a vector and don't want to mess with smart pointers, you might do well to not keep track of "references" to the items in the vector, but their index positions. Suppose you want to keep track of the item at vecLargeClass[3]. Even if you do something to invalidate iterators, the item in question will still be at index 3. Instead of keeping track of interators or pointers to things, keep track of where they are in the vector.
Be careful when storing a pointer or a reference to a vector element. There are certain operations that can invalidate those references, such as push_back, resize, etc. If the index is what you're sure will not change, then it would be the safest. Smart pointers, as marcin_j mentioned in the comment, will not help with the invalidation in case of push_back, resize, etc.

Problems with C++ containers

I have a std::list in a C++ program, which contains objects of a Class A.
Lets say I have 10 objects in it. I have a reference to the 6th object stored, in another data structure say ref_6. Lets say I need to remove the 8th element from my list. To do this, I would use pop_front 8 times and store 8 objects in a vector and use push_front 7 times to insert the first 7 elements back in the list so now my resulting list would have 9 elemnts. Now i when i try to access the object stored in ref_6 , which was the 6th element , I cant do it. There is some garbage value in this reference.
I am assuming that when i do a pop and a push, the memory location of the same object changes . How do I deal with this ?
Why would you erase things in such a manner? D: It's not a stack. The entire point (and only point*) of a list is that you can remove any element in constant time. (Though finding it is linear.)
Just do this:
typedef std::list<T> list_type;
list_type mylist; // populate it
list_type::iterator iter = mylist.begin();
std::advance(iter, 8); // move to 8th item
mylist.erase(iter); // erase it
And no other iterators are invalidated. (Indeed, erasing an element invalidates any references to it.)
*You probably shouldn't even be using a list. Lists are nice when it comes to learning data structures, but they're pretty awful.
The list stores its elements in discontinous chunks of memory that get freed when the element is removed from the list. So the reference ( which is implemented simply as a pointer ) points to an element whose memory has been already freed.
Easier way to remove a given element from the list is to get the iterator pointing to it and use method
std::list::iterator = /*somehow get the iterator to the 8th element*/
yourList.erase(8th_element_iterator);
The first step ( getting the iterator to the 8th element ) can be done for example by getting the iterator of the begininning of the list and advancing it 7 positions forward:
std::list::iterator first_iter = yourList.begin();
std::list::iterator 8th_iter = std::advance(first_iter, 7);
Something smells fishy here... You are storing object of type T in a std::list<T> by value. You keep references to those objects in other places. Is that correct? If yes, I see several problems... Many manipulations of lists might invalidate stored references, since std::list<T> only guarantees a sequential order of values of elements of type T. If you want to store references to those elements in several places use std::tr1::shared_ptr<T> and std::list<std::shared_ptr<T> >. Then, you can safely remove or add (even reposition) elements in your list and the references kept in other places remain valid. Beware of storing std::list<T>iterators, the problem would be the same.
I am referring to your response. Sorry, I didn't get the account thing right...
Please consider the following:
std::list<A> tList;
A tA;
tList.push_back(tA);
assert(&tA == &tList.back()); // boom!
A *tAPtr = &tList.front();
tList.erase(tList.front());
// try to access tAPtr:
tAPtr->Foo(); // boom! (probably)
The point is that instances of A are stored by value (= copied), so what you are doing is inherently unsafe. Use std::list<std::tr1::shared_ptr<A> > instead!