I can't seem to find much information about whether iterators keep hold of the underlying object they are iterating over.
If I create an iterator, then the object that supplied it goes out of scope, does the presence of the iterator prevent it from being destroyed?
Here is a very simple example just to illustrate the scenario:
// This class takes a copy of iterators to use them later
class Data {
public:
Data(std::vector<int>::iterator start, std::vector<int>::iterator end)
: start(start),
end(end)
{}
void show() {
// Use this->start and this->end for some purpose
}
private:
std::vector<int>::iterator start;
std::vector<int>::iterator end;
};
Data test() {
std::vector<int> v{1, 2, 3};
Data d(v.begin(), v.end());
d.show(); // this would be ok
return d;
}
int main(void) {
Data d = test();
d.show(); // What happens here?
}
In this example, the Data object is storing a copy of the iterators, which is fine for the first show() call. However by the time of the second show() call, the original object that supplied the iterators no longer exists.
Do the iterators keep the object around until such time as they are all themselves destroyed, or are the iterators invalidated as soon as the original object goes out of scope?
Here is one reference of many which doesn't say what happens one way or the other (or even whether the result of this is 'undefined'.)
Iterators typically don't own the data over which they iterate, no. In fact, they're rarely (if ever) even aware of the object that owns the data; vector iterators, for example, are often just pointers, which have no knowledge of any vector or of its lifetime. Even those iterators that are not implemented as pointers (which is most of them) may be considered a kind of "pointer", and treated as such: they can quite easily become dangling.
Your example has UB because you'll dereference invalid iterators inside show() the second time.
If your container goes out of scope then all your iterators become invalidated. In fact, there are all manner of reasons why an iterator may become invalidated, such as adding to a vector when that operation results in a capacity expansion.
It's possible to find iterators that do kind of "own" data, instead of iterating over some collection found elsewhere (such as Boost's counting iterators), but these are magical properties that take advantage of C++ to provide a magical function, not an inherent property of iterators as defined by C++.
An iterator is generally only valid as long as its originating container or "sequence" has not been changed, because a change might cause memory reallocation and memory moves. Since the iterator usually reference memory in the originating container, a change in said container might invalidate the iterator.
Now, a container that goes out of scope gets its destructor executed. That will obviously change the container and hence any iterator to it will be invalidated in the process.
First, iterator does not have an interface to reference an object it iterates over. It only implements pointer semantics, so you may think of it as of abstract pointer. Of course, it's internal implementation may hold a pointer to that object, but it's very unlikely in real-world implementations.
Second, when your container is destroyed (and it is when it goes out of scope), all objects in the container are being destroyed too. Thus, iterator becomes invalid after you container was destroyed. After that incrementing, decrementing and dereferencing the iterator will cause undefined behavior.
Related
I know that push_back() on an std::vector can cause reallocation and therefore invalidate iterators in the pointer. Is there a way of installing a hook on reallocations (which presumably happen very seldom) so that I can adjust iterators appropriately?
Ideally something like this:
class hook; // forward
std::vectorwithhook<T,hook> v;
auto pointer = v.end();
template<> class hook<T> {
void operator()(T *old, T *new) { pointer += new-old; }
}
and then I can push_back() on v and play with pointer with no fear.
IMHO the easiest way to do this would be to have your vectorwithhook::push_back return the new end() and use it like:
pointer = v.push_back(new_item);
NOTE: you would have to do this for all members that change content of the vector (e.g. emplace_back, pop_back, insert etc...)
Alternatively, it should also be possible by creating your own allocator type, which will take a reference to iterator and the container in constructor and update it every time allocator::allocate(...) or allocator::dellocate(...) is called. NOTE that this goes against the principals of STL that was designed to have iterators, containers, allocators separate from one another...
P.S. none of this sounds like a good idea tbh, I would think about reworking the code to avoid keeping the end() iterator instead of doing any of this.
I will ask the question first and the motivation next, and finally an illustrative code sample which compiles and executes as expected.
Question
If I can assure myself that an iterator will not get invalidated in the duration when I will be needing to use it, is it safe to hold a pointer to an iterator (e.g. a pointer to a list<int>::iterator).
Motivation
I have multiple containers and I need direct cross references from items held in one container to the corresponding items held in another container and so on. An item in one container might not always have a corresponding item in another container.
My idea thus is to store a pointer to an iterator to an element in container #2 in the element stored in container #1 and so forth. Why? Because once I have an iterator, I can not only access the element in container #2, but if needed, I can also erase the element in container #2 etc.
If there is a corresponding element in container #2, I will store a pointer to the iterator in the element in container #1. Else, this pointer will be set to NULL. Now I can quickly check that if the pointer to the iterator is NULL, there is no corresponding element in container #2, if non-NULL, I can go ahead and access it.
So, is it safe to store pointers to iterators in this fashion?
Code sample
#include <iostream>
#include <list>
using namespace std;
typedef list<int> MyContainer;
typedef MyContainer::iterator MyIterator;
typdef MyIterator * PMyIterator;
void useIter(PMyIterator pIter)
{
if (pIter == NULL)
{
cout << "NULL" << endl;
}
else
{
cout << "Value: " << *(*pIter) << endl;
}
}
int main()
{
MyContainer myList;
myList.push_back(1);
myList.push_back(2);
PMyIterator pIter = NULL;
// Verify for NULL
useIter(pIter);
// Get an iterator
MyIterator it = myList.begin();
// Get a pointer to the iterator
pIter = & it;
// Use the pointer
useIter (pIter);
}
Iterators are generally handled by value. For instance, begin() and end() will return an instance of type iterator (for the given iterator type), not iterator& so they return copies of a value every time.
You can of course take an address to this copy but you cannot expect that a new call to begin() or end() will return an object with the same address, and the address is only valid as long as you hold on to the iterator object yourself.
std::vector<int> x { 1, 2, 3 };
// This is fine:
auto it = x.begin();
auto* pi = ⁢
// This is not (dangling pointer):
auto* pi2 = &x.begin();
It rarely makes sense to maintain pointers to iterators: iterators are already lightweight handles to data. A further indirection is usually a sign of poor design. In your example in particular the pointers make no sense. Just pass a normal iterator.
The problem with iterators is that there are a lot of operations on containers which invalidate them (which one depend on the container in question). When you hold an iterator to a container which belongs to another class, you never know when such an operation occurs and there is no easy way to find out that the iterator is now invalid.
Also, deleting elements directly which are in a container which belongs to another class, is a violation of the encapsulation principle. When you want to delete data of another class, you should better call a public method of that class which then deletes the data.
Yes, it is safe, as long as you can ensure the iterators don't get invalidated and don't go out of scope.
Sounds scary. The iterator is an object, if it leaves scope, your pointer is invalid. If you erase an object in container #2, all iterators may become invalid (depending on the container) and thus your pointers become useless.
Why don't you store the iterator itself? For the elements in container #1 that don't refer to anything, store container2.end().
This is fine as long as iterators are not invalidated. If they are, you need to re-generate the mapping.
Yes it is possible to work on pointers to iterators like it is to other types but in your example it is not necessary since you can simple pass the pass the original iterator as reference.
In general it is not a good idea to store iterators since the iterator may become invalid as you modify the container. Better store the containers and create iterators as you need them.
I'm confused about how C++ manages objects in vector. Say I do the following:
vector<MyClass> myVector;
myVector.push_back(a);
myVector.push_back(b);
MyClass & c = myVector[1];
myVector.erase(myVector.begin());
Is the reference c still valid (or better yet, is it guaranteed to be valid)? If not, do I have to always make copy from the reference to ensure safety?
Unlike Java or C# references (which are more like C++ pointers than C++ references), references in C++ are as "dumb" as pointers, meaning that if you get the reference of an object, and then you move that object in memory, your reference is not anymore valid.
Is the reference c still valid (or better yet, is it guaranteed to be valid)?
In the case you're describing, the standard vector is not guaranteed to keep the objects it contains at the same place in memory when the vector contents changes (removal of an item, resizing of the vector, etc.).
This will invalidate both iterators and pointer/references to the object contained.
If not, do I have to always make copy from the reference to ensure safety?
There is multiple ways to continue to "point" to the right objects, all of them implying a level of indirection.
Full/value copy
The simplest is making a full copy of MyClass:
vector<MyClass> x ;
x.push_back(a) ;
x.push_back(b) ;
MyClass c = x[1] ; // c is a full copy of b, not a reference to b
x.erase(x.begin()) ;
Using the right container
The second simplest is to use a std::list which is specifically designed for element insertion and removal, and will not change the contained objects, nor invalidate pointers, references or iterators to them:
list<MyClass> x ;
x.push_back(a) ;
x.push_back(b) ;
list<MyClass> it = x.begin() ;
++it ;
MyClass & c = *it ;
x.erase(x.begin()) ;
Using pointers (unsafe)
Another would be to make a std::vector<MyClass *>, which would contain pointers to MyClass instead of MyClass objects. You will then be able to keep a pointer or a reference to the pointed object, with a slightly different notation (because of the extra indirection):
vector<MyClass *> x;
x.push_back(a); // a being a MyClass *
x.push_back(b); // b being a MyClass *
MyClass * c = x[1]; // c points to the same object as b
x.erase(x.begin()); // note that a will still need separate deallocation
This is unsafe because there is no clear (as far as the compiler is concerned) owner of the objects a and b, meaning there is no clear piece of code responsible to deallocating them when they are not needed anymore (this is how memory leaks happen in C and C++)
So if you us this method, make sure the code is well encapsulated, and as small as possible to avoid maintenance surprises.
Using smart pointers (safer)
Something better would be using smart pointers. For example, using C++11's (or boost's) shared_ptr:
vector< shared_ptr<MyClass> > x;
x.push_back(a); // a being a shared_ptr<MyClass>
x.push_back(b); // b being a shared_ptr<MyClass>
shared_ptr<MyClass> c = x[1]; // c points to the same object as b
x.erase(x.begin()); // No deallocation problem
Now, if you use shared_ptr, and know nothing about weak_ptr, you have a problem, so you should close that gap.
Using smart pointers 2 (safer)
Another solution would be to use C++11's unique_ptr, which is the exclusive owner of the pointed object. So if you want to have a pointer or a reference to the pointer object, you will have to use raw pointers:
vector< unique_ptr<MyClass> > x;
x.push_back(a); // a being a unique_ptr<MyClass>
x.push_back(b); // b being a unique_ptr<MyClass>
MyClass * c = x[1].get(); // c points to the same object as b
x.erase(x.begin()); // No deallocation problem
Note here that the vector is the unique owner of the objects, unlike the case above with the smart_ptr.
Conclusion
You are coding in C++, meaning you have to choose the right method for your problem.
But first, you want to be sure to understand the level of indirection added by pointers, what pointers do, and what C++ references do (and why they aren't C#/Java references).
From a reference for vector:
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
Therefore, your iterator is invalidated and after the erase and must not be used.
Is the reference c still valid (or better yet, is it guaranteed to be valid)?
Nope it is now in an undefined state.
If not, do I have to always make copy from the reference to ensure safety?
If there is a possibility of it being deleted or moved (etc) between your retrieving it and your using it, then yes you DO need to make a copy.
References into the contents of a vector are made invalid if you change the position or size of the vector in general: there are a few exceptions. You only have to make a copy if you intend to modify the position of data in the vector, or the size or number of elements in the vector before you use the data.
A blog I came across at http://developer-resource.blogspot.com.au/2009/01/pros-and-cons-of-returing-references.html writes:
After working in this code base for a while now I believe that
returning references are evil and should be treated just like
returning a pointer, which is avoid it.
For example the problem that arose that took a week to debug was the
following:
class Foo {
std::vector< Bar > m_vec;
public:
void insert(Bar& b) { m_vec.push_back(b); }
Bar const& getById(int id) { return m_vec[id]; }
}
The problem in this example is clients are calling and getting
references that are stored in the vector. Now what happens after
clients insert a bunch of new elements? The vector needs to resize
internally and guess what happens to all those references? That's
right there invalid. This caused a very hard to find bug that was
simply fixed by removing the &.
I can't see anything wrong with the code. Am I misunderstanding return by reference & STL containers, or is the post incorrect?
Say for example you have 2 elements in the vector:
a and b. You return references for these r1 and r2.
Now another client does an insert into the vector. Since the vector has only two element storage present. It reallocs the storage. It copies a and b and inserts c after them. This changes the locations of a and b. So references r1 and r2 are now invalid and are pointing to junk locations.
If the getById method was not return by reference a copy would have been made and everything would have worked fine.
The issue is more easily displayed as:
std::vector<int> vec;
vec.push_back(1);
const int& ref = vec[0];
vec.push_back(ref);
The contents of vec[1] are undefined. In the second push_back, ref is reference to wherever in memory vec[0] was when it was initialized. Inside of push_back, the vector might have to reallocate, thereby invalidating what ref refers to.
This is a major inconvenience, but luckily, it isn't a problem that happens all too often. Is Foo a container people insert the same Bar they just found by ID? That seems funny to me. Incurring a copy on every access seems like overkill to solve the problem. If you think it is bad enough,
void insert(const Bar& b)
{
if ((m_vec.data() <= &b) && (&b < m_vec.data() + m_vec.size()))
{
Bar copy(b);
return insert(copy);
}
else
m_vec.push_back(b);
}
In C++11, it would be much better to write Foo::insert like so (assuming Bar has a decent move constructor):
void insert(Bar b)
{
m_vec.emplace_back(std::move(b));
}
In addition to other answers, it's worth pointing out that this effect depends on a container type. For example, for vectors we have:
Vector reallocation occurs when a member function must increase the
sequence contained in the vector object beyond its current storage
capacity. Other insertions and erasures may alter various storage
addresses within the sequence. In all such cases, iterators or
references that point at altered portions of the sequence become
invalid. If no reallocation happens, only iterators and references
before the insertion/deletion point remain valid.
while lists are more relaxed in this regard due to their manner of storing data:
List reallocation occurs when a member function must insert or erase
elements of the list. In all such cases, only iterators or references
that point at erased portions of the controlled sequence become
invalid.
And so on for other container types.
From the comment of the same article:
The issue here is not that returning by reference is evil, just that
what you are returning a reference to can become invalid.
Page 153, section 6.2 of "C++ Standard Library: A Tutorial and
Reference" - Josuttis, reads:
"Inserting or removing elmeents invalidates references, pointers, and
iterators that refer to the following elements. If an insertion causes
reallocation, it invalidates all references, iterators, and pointers"
Your code sample is as evil as holding a reference to the first
element of the vector, inserting 1000 elements into the vector, and
then trying to use the existing reference.
This is a pretty straightforward architectural question, however it's been niggling at me for ages.
The whole point of using a list, for me anyway, is that it's O(1) insert/remove.
The only way to have an O(1) removal is to have an iterator for erase().
The only way to get an iterator is to keep hold of it from the initial insert() or to find it by iteration.
So, what to pass around; an Iterator or a pointer?
It would seem that if it's important to have fast removal, such as some sort of large list which is changing very frequently, you should pass around an iterator, and if you're not worried about the time to find the item in the list, then pass around the pointer.
Here is a typical cut-down example:
In this example we have some type called Foo. Foo is likely to be a base class pointer, but it's not here for simplicity.
Then we have FooManger, which holds a list of shared_ptr, FooPtr . The manager is responsible for the lifetime of the object once it's been passed to it.
Now, what to return from addFoo()?
If I return a FooPtr then I can never remove it from the list in O(1), because I will have to find it in the list.
If I return a std::list::iterator, FooPtrListIterator, then anywhere I need to remove the FooPtr I can, just by dereferencing the iterator.
In this example I have a contrived example of a Foo which can kill itself under some circumstance, Foo::killWhenConditionMet().
Imagine some Foo that has a timer which is ticking down to 0, at which point it needs to ask the manager to delete itself. The trouble is that 'this' is a naked Foo*, so the only way to delete itself, is to call FooManager::eraseFoo() with a raw pointer. Now the manager has to search for the object pointer to get an iterator so it can be erased from the list, and destroyed.
The only way around that is to store the iterator in the object. i.e Foo has a FooPtrListIterator as a member variable.
struct Foo;
typedef boost::shared_ptr<Foo> FooPtr;
typedef std::list<FooPtr> FooPtrList;
typedef FooPtrList::iterator FooPtrListIterator;
struct FooManager
{
FooPtrList l;
FooPtrListIterator addFoo(Foo *foo) {
return l.insert(l.begin(), FooPtr(foo));
}
void eraseFoo(FooPtrListIterator foo) {
l.erase(foo);
}
void eraseFoo(Foo *foo) {
for (FooPtrListIterator it=l.begin(), ite=l.end(); it!=ite; ++it) {
if ((*it).get()==foo){
eraseFoo(it);
return;
}
}
assert("foo not found!");
}
};
FooManager g_fm;
struct Foo
{
int _v;
Foo(int v):_v(v) {
}
~Foo() {
printf("~Foo %d\n", _v);
}
void print() {
printf("%d\n", _v);
}
void killWhenConditionMet() {
// Do something that will eventually kill this object, like a timer
g_fm.eraseFoo(this);
}
};
void printList(FooPtrList &l)
{
printf("-\n");
for (FooPtrListIterator it=l.begin(), ite=l.end(); it!=ite; ++it) {
(*it)->print();
}
}
void test2()
{
FooPtrListIterator it1=g_fm.addFoo(new Foo(1));
printList(g_fm.l);
FooPtrListIterator it2=g_fm.addFoo(new Foo(2));
printList(g_fm.l);
FooPtrListIterator it3=g_fm.addFoo(new Foo(3));
printList(g_fm.l);
(*it2)->killWhenConditionMet();
printList(g_fm.l);
}
So, the questions I have are:
1. If an object needs to delete itself, or have some other system delete it, in O(1), do I have to store an iterator to object, inside the object? If so, are there any gotchas to do with iterators becoming invalid due other container iterations?
Is there simply another way to do this?
As a side question, does anyone know why and of the 'push*' stl container operations don't return the resultant iterator, meaning one has to resort to 'insert*'.
Please, no answers that say "don't pre-optimise", it drives me nuts. ;) This is an architectural question.
C++ standard in its [list.modifiers] section says that any list insertion operation "does not affect the validity of iterators and references", and any removal operation "invalidates only the iterators and references to the erased elements". So keeping iterators around would be safe.
Keeping iterators inside the objects also seems sane. Especially if you don't call them iterators, but rather name like FooManagerHandlers, which are processed by removal function in an opaque way. Indeed, you do not store "iterators", you store "representatives" of objects in an organized structure. These representatives are used to define a position of an object inside that structure. This is a separate, quite a high-level concept, and there's nothing illogical in implementing it.
However, the point of using lists is not just O(1) insert/remove, but also keeping elements in an order. If you don't need any order, then you would probably find hash tables more useful.
The one problem I see with storing the iterator in the object is that you must be careful of deleting the object from some other iterator, as your objects destructor does not know where it was destroyed from, so you can end up with an invalid iterator in the destructor.
The reason that push* does not return an iterator is that it is the inverse of pop*, allowing you to treat your container as a stack, queue, or deque.