Is it safe to hold pointers to iterators in C++? - c++

I will ask the question first and the motivation next, and finally an illustrative code sample which compiles and executes as expected.
Question
If I can assure myself that an iterator will not get invalidated in the duration when I will be needing to use it, is it safe to hold a pointer to an iterator (e.g. a pointer to a list<int>::iterator).
Motivation
I have multiple containers and I need direct cross references from items held in one container to the corresponding items held in another container and so on. An item in one container might not always have a corresponding item in another container.
My idea thus is to store a pointer to an iterator to an element in container #2 in the element stored in container #1 and so forth. Why? Because once I have an iterator, I can not only access the element in container #2, but if needed, I can also erase the element in container #2 etc.
If there is a corresponding element in container #2, I will store a pointer to the iterator in the element in container #1. Else, this pointer will be set to NULL. Now I can quickly check that if the pointer to the iterator is NULL, there is no corresponding element in container #2, if non-NULL, I can go ahead and access it.
So, is it safe to store pointers to iterators in this fashion?
Code sample
#include <iostream>
#include <list>
using namespace std;
typedef list<int> MyContainer;
typedef MyContainer::iterator MyIterator;
typdef MyIterator * PMyIterator;
void useIter(PMyIterator pIter)
{
if (pIter == NULL)
{
cout << "NULL" << endl;
}
else
{
cout << "Value: " << *(*pIter) << endl;
}
}
int main()
{
MyContainer myList;
myList.push_back(1);
myList.push_back(2);
PMyIterator pIter = NULL;
// Verify for NULL
useIter(pIter);
// Get an iterator
MyIterator it = myList.begin();
// Get a pointer to the iterator
pIter = & it;
// Use the pointer
useIter (pIter);
}

Iterators are generally handled by value. For instance, begin() and end() will return an instance of type iterator (for the given iterator type), not iterator& so they return copies of a value every time.
You can of course take an address to this copy but you cannot expect that a new call to begin() or end() will return an object with the same address, and the address is only valid as long as you hold on to the iterator object yourself.
std::vector<int> x { 1, 2, 3 };
// This is fine:
auto it = x.begin();
auto* pi = &it;
// This is not (dangling pointer):
auto* pi2 = &x.begin();
It rarely makes sense to maintain pointers to iterators: iterators are already lightweight handles to data. A further indirection is usually a sign of poor design. In your example in particular the pointers make no sense. Just pass a normal iterator.

The problem with iterators is that there are a lot of operations on containers which invalidate them (which one depend on the container in question). When you hold an iterator to a container which belongs to another class, you never know when such an operation occurs and there is no easy way to find out that the iterator is now invalid.
Also, deleting elements directly which are in a container which belongs to another class, is a violation of the encapsulation principle. When you want to delete data of another class, you should better call a public method of that class which then deletes the data.

Yes, it is safe, as long as you can ensure the iterators don't get invalidated and don't go out of scope.

Sounds scary. The iterator is an object, if it leaves scope, your pointer is invalid. If you erase an object in container #2, all iterators may become invalid (depending on the container) and thus your pointers become useless.
Why don't you store the iterator itself? For the elements in container #1 that don't refer to anything, store container2.end().
This is fine as long as iterators are not invalidated. If they are, you need to re-generate the mapping.

Yes it is possible to work on pointers to iterators like it is to other types but in your example it is not necessary since you can simple pass the pass the original iterator as reference.
In general it is not a good idea to store iterators since the iterator may become invalid as you modify the container. Better store the containers and create iterators as you need them.

Related

Do C++ iterators hold a reference to the underlying object?

I can't seem to find much information about whether iterators keep hold of the underlying object they are iterating over.
If I create an iterator, then the object that supplied it goes out of scope, does the presence of the iterator prevent it from being destroyed?
Here is a very simple example just to illustrate the scenario:
// This class takes a copy of iterators to use them later
class Data {
public:
Data(std::vector<int>::iterator start, std::vector<int>::iterator end)
: start(start),
end(end)
{}
void show() {
// Use this->start and this->end for some purpose
}
private:
std::vector<int>::iterator start;
std::vector<int>::iterator end;
};
Data test() {
std::vector<int> v{1, 2, 3};
Data d(v.begin(), v.end());
d.show(); // this would be ok
return d;
}
int main(void) {
Data d = test();
d.show(); // What happens here?
}
In this example, the Data object is storing a copy of the iterators, which is fine for the first show() call. However by the time of the second show() call, the original object that supplied the iterators no longer exists.
Do the iterators keep the object around until such time as they are all themselves destroyed, or are the iterators invalidated as soon as the original object goes out of scope?
Here is one reference of many which doesn't say what happens one way or the other (or even whether the result of this is 'undefined'.)
Iterators typically don't own the data over which they iterate, no. In fact, they're rarely (if ever) even aware of the object that owns the data; vector iterators, for example, are often just pointers, which have no knowledge of any vector or of its lifetime. Even those iterators that are not implemented as pointers (which is most of them) may be considered a kind of "pointer", and treated as such: they can quite easily become dangling.
Your example has UB because you'll dereference invalid iterators inside show() the second time.
If your container goes out of scope then all your iterators become invalidated. In fact, there are all manner of reasons why an iterator may become invalidated, such as adding to a vector when that operation results in a capacity expansion.
It's possible to find iterators that do kind of "own" data, instead of iterating over some collection found elsewhere (such as Boost's counting iterators), but these are magical properties that take advantage of C++ to provide a magical function, not an inherent property of iterators as defined by C++.
An iterator is generally only valid as long as its originating container or "sequence" has not been changed, because a change might cause memory reallocation and memory moves. Since the iterator usually reference memory in the originating container, a change in said container might invalidate the iterator.
Now, a container that goes out of scope gets its destructor executed. That will obviously change the container and hence any iterator to it will be invalidated in the process.
First, iterator does not have an interface to reference an object it iterates over. It only implements pointer semantics, so you may think of it as of abstract pointer. Of course, it's internal implementation may hold a pointer to that object, but it's very unlikely in real-world implementations.
Second, when your container is destroyed (and it is when it goes out of scope), all objects in the container are being destroyed too. Thus, iterator becomes invalid after you container was destroyed. After that incrementing, decrementing and dereferencing the iterator will cause undefined behavior.

Understanding const_iterator with pointers?

I'm trying to understand what const_iterator means. I have the following example code:
void CustomerService::RefreshCustomers()
{
for(std::vector<Customer*>::const_iterator it = customers_.begin();
it != customers_.end() ; it ++)
{
(*it)->Refresh();
}
}
Refresh() is a method in the Customer class and it is not defined as const. At first thought I thought const_iterator was supposed to disallow modification to the elements of the container. However, this code compiles without complaint. Is this because there's an extra level of indirection going on? What exactly does const_iterator do/mean?
UPDATE
And in a situation like this, is it best practice to use const_iterator?
A const_iterator over a vector<Customer*> will give you a Customer * const not a Customer const*. So you cannot actually change the value being iterated (a pointer), but you sure can change the object pointed to by it. Basically all it says in your case is that you can't do this:
*it = ..something..;
You're not modifying the contents of the container. The contents of the container are just pointers. However, you can freely modify whatever the pointers point to.
If you didn't want to be able to modify what the pointers point to, you'd need a vector<const Customer*>.
const_iterator is not about whether you can modify the container or not, but about whether you can modify the objects in the container or not. In your case the container contains pointers, and you cannot modify the pointers themselves (any more than you could modify integers...) You can still make call to non-const Refresh() behind a pointer from the collection, because that call does not change the pointer itself.
Difference between const_iterator and iterator is important [only] when your container contains e.g. class instances, not pointers to them, but the instances themselves, for example in a container
list < pair < int , int > >
If 'it' is a const_iterator into this list, you can't do
it->first = 5
but if it is iterator (not const_iterator), that works.

Vector iterators

I have a the following code.
vector<IRD>* irds = myotherobj->getIRDs();//gets a pointer to the vector<IRD>
for(vector<IRD>::iterator it = irds->begin(); it < irds->end(); it++)
{
IRD* ird = dynamic_cast<IRD*>(it);
ird->doSomething();
//this works (*it).doSomething();
}
This seems to fail...I just want to get the pointer to each element in the vector without using (*it). all over.
How do I get the pointer to the object?
When I iterate over the vector pointer irds, what exactly am I iterating over? Is it a copy of each element, or am I working with the actual object in the vector when I say (*it).doSomething(),
Why do you want to get a pointer?
Use a reference:
for(vector<IRD>::iterator it = irds->begin(); it != irds->end(); ++it)
{
IRD & ird = *it;
ird.doSomething();
}
Alternatively:
for(vector<IRD>::iterator it = irds->begin(); it != irds->end(); ++it)
{
it->doSomething();
}
Also, as everyone said, use != when comparing iterators, not <. While it'll work in this case, it'll stop working if you use a different container (and that's what iterators are for: abstracting the underlying container).
You need to use != with iterators to test for the end, not < like you would with pointers. operator< happens to work with vector iterators, but if you switch containers (to one like list) your code will no longer compile, so it's generally good to use !=.
Also, an iterator is not the type that it points to, so don't try to cast it. You can use the overloaded operator-> on iterators.
vector<IRD>* irds = myotherobj->getIRDs();//gets a pointer to the vector<IRD>
for(vector<IRD>::iterator it = irds->begin(); it != irds->end(); ++it)
{
it->dosomething();
}
Dereference the iterator to get a reference to the underlying object.
vector<IRD>* irds = myotherobj->getIRDs();
for(vector<IRD>::iterator it = irds->begin(); it != irds->end(); ++it)
{
IRD& ird = *it;
ird.doSomething();
// alternatively, it->doSomething();
}
First consider whether you actually need a pointer to the element or if you're just trying to kind of use iterators but kind of avoid them. It looks like you're trying to code C in C++, rather than coding C++. In the example you gave, it seems like rather than converting to a pointer and then working with that pointer, why not just use the iterator directly? it->doSomething() instead of ird->doSomething().
If you're thinking that you need to save that pointer for later to use on the vector after doing some work, that's potentially dangerous. Vector iterators and pointers to elements in a vector can both be invalidated, meaning they no longer point to the vector, so you are basically attempting to use memory after you've freed it, a dangerous thing to do. A common example of things that can invalidate an iterator is adding a new element. I got into the mess of trying to store an iterator and I did a lot of work to try to make it work, including writing a "re_validate_iterator()" function. Ultimately, my solution proved to be very confusing and didn't even work in all cases, in addition to not being scalable.
The solution to trying to store the position of the vector is to store it as an offset. Some integer indicating the position within the vector that your element is at. You can then access it with either myvector.begin() + index if you need to work with iterators, or myvector.at (index) if you want a reference to the element itself with bounds checking, or just myvector [index] if you don't need bounds checking.
You can get a pointer from an iterator by doing &*it. You get a pointer to the actual IRD object stored inside the vector. You can modify the object through the pointer and the modification will "stick": it will persist inside the vector.
However, since your vector contains the actual objects (not pointers to objects) I don't see any point in dynamic_cast. The type of the pointer is IRD * and it points to IRD object.
The only case when the dereferenced iterator might refer to a copy (or, more precisely, to a proxy object) is vector<bool>, which might be implemented as a bit-vector.
When I iterate over the vector pointer irds, what exactly am I iterating over? Is it a copy of each element, or am I working with the actual object in the vector when I say (*it).doSomething(),
When you iterate over a vector you work with the object itself, not a copy of it.
The usual idiom is &*it to get a pointer. Dynamic casts have nothing to do with it.

Does ptr_vector iterator not require increments?

#include <boost/ptr_container/ptr_vector.hpp>
#include <iostream>
using namespace std;
class Derived
{
public:
int i;
Derived() {cout<<"Constructed Derived"<<endl;}
Derived(int ii):i(ii) {cout<<"Constructed Derived"<<i<<endl;}
~Derived() {cout<<"* Destructed Derived"<<i<<endl;}
};
int main()
{
boost::ptr_vector<Derived> pv;
for(int i=0;i<10;++i) pv.push_back(new Derived(i));
boost::ptr_vector<Derived>::iterator it;
for (it=pv.begin(); it<pv.end();/*no iterator increment*/ )
pv.erase(it);
cout<<"Done erasing..."<<endl;
}
Notice that the second for loop does not increment the iterator, yet it iterates and erases all elements. My questions are:
Is my technique of iteration and using the iterator correct?
If iterator increment is not required in the for loop, then where does the increment happen?
Is it better to use an iterator or will an ordinary integer suffice (ie: is there any value-add with using iterators)? (coz I can also erase the 5th element like pv.erase(pv.begin()+5);)
Is there any way to assign a new object to a specific position (let's say the 5th position) of ptr_vector, directly? I'm looking for something like pv[5]=new Derived(5);. Any way of doing that?
A ptr_vector::iterator increments just like a normal random access iterator. In your example, you are able to erase every element without actually incrementing because after you erase an element, every element after it is moved over in the array. So when you erase the 0th element, your iterator now points to the element which used to be the 1st element, but is now the 0th element, and so on. In other words, the iterator is staying in place while the whole vector is shifting over to the left.
This has nothing specifically to do with ptr_vector. Note the same behavior would occur with a plain std::vector.
Also note that using an iterator after you erase the element it points to is dangerous. In your case it works, but it's better to take the return value of ptr_vector::erase so you get a new iterator which is guaranteed to be valid.
for (it = pv.begin(); it != pv.end(); )
it = pv.erase(it);
As for your other questions:
If you only want to erase a specific element, then of course you should erase it directly using pv.erase(pv.begin() + N). To assign a new value to a specific element in the pointer vector, simply say pv[N] = Derived(whatever). You don't need to use new when reassigning a value. The pointer vector will invoke the assignment operator of the object at the index you assign the new value to.
Is my technique of iteration and using the iterator correct?
No, erasing from a container generally invalidates the iterator to the erased item. If it works, this is just a side-effect of the implementation details.
The correct way would be to use the return value of the erase method:
it = pv.erase(it);
However, for emptying the container, you can use the clear member function.
If iterator increment is not required in the for loop, then where
does the increment happen?
It doesn't happen, because you'll always be erasing the first item in the container (by chance, might not work out with other containers).
Is it better to use an iterator or will an ordinary integer suffice (ie:
is there any value-add with using
iterators)? (coz I can also erase the
5th element like
pv.erase(pv.begin()+5);)
In a random-access container you can do that, otherwise not (such as a list).
Is there any way to assign a new object to a specific position (let's
say the 5th position) of ptr_vector,
directly? I'm looking for something
like pv[5]=new Derived(5);. Any way
of doing that?
According to the boost reference:
pv.replace(5, new Derived(5));
This returns the existing pointer in a smart pointer, so it will be automatically deallocated.
(It's curious that this takes an index, not an iterator...).
Or:
pv[5] = Derived(5);
but this will just modify the stored object, not change the pointer.

Architectural C++/STL question about iterator usage for O(1) list removal by external systems

This is a pretty straightforward architectural question, however it's been niggling at me for ages.
The whole point of using a list, for me anyway, is that it's O(1) insert/remove.
The only way to have an O(1) removal is to have an iterator for erase().
The only way to get an iterator is to keep hold of it from the initial insert() or to find it by iteration.
So, what to pass around; an Iterator or a pointer?
It would seem that if it's important to have fast removal, such as some sort of large list which is changing very frequently, you should pass around an iterator, and if you're not worried about the time to find the item in the list, then pass around the pointer.
Here is a typical cut-down example:
In this example we have some type called Foo. Foo is likely to be a base class pointer, but it's not here for simplicity.
Then we have FooManger, which holds a list of shared_ptr, FooPtr . The manager is responsible for the lifetime of the object once it's been passed to it.
Now, what to return from addFoo()?
If I return a FooPtr then I can never remove it from the list in O(1), because I will have to find it in the list.
If I return a std::list::iterator, FooPtrListIterator, then anywhere I need to remove the FooPtr I can, just by dereferencing the iterator.
In this example I have a contrived example of a Foo which can kill itself under some circumstance, Foo::killWhenConditionMet().
Imagine some Foo that has a timer which is ticking down to 0, at which point it needs to ask the manager to delete itself. The trouble is that 'this' is a naked Foo*, so the only way to delete itself, is to call FooManager::eraseFoo() with a raw pointer. Now the manager has to search for the object pointer to get an iterator so it can be erased from the list, and destroyed.
The only way around that is to store the iterator in the object. i.e Foo has a FooPtrListIterator as a member variable.
struct Foo;
typedef boost::shared_ptr<Foo> FooPtr;
typedef std::list<FooPtr> FooPtrList;
typedef FooPtrList::iterator FooPtrListIterator;
struct FooManager
{
FooPtrList l;
FooPtrListIterator addFoo(Foo *foo) {
return l.insert(l.begin(), FooPtr(foo));
}
void eraseFoo(FooPtrListIterator foo) {
l.erase(foo);
}
void eraseFoo(Foo *foo) {
for (FooPtrListIterator it=l.begin(), ite=l.end(); it!=ite; ++it) {
if ((*it).get()==foo){
eraseFoo(it);
return;
}
}
assert("foo not found!");
}
};
FooManager g_fm;
struct Foo
{
int _v;
Foo(int v):_v(v) {
}
~Foo() {
printf("~Foo %d\n", _v);
}
void print() {
printf("%d\n", _v);
}
void killWhenConditionMet() {
// Do something that will eventually kill this object, like a timer
g_fm.eraseFoo(this);
}
};
void printList(FooPtrList &l)
{
printf("-\n");
for (FooPtrListIterator it=l.begin(), ite=l.end(); it!=ite; ++it) {
(*it)->print();
}
}
void test2()
{
FooPtrListIterator it1=g_fm.addFoo(new Foo(1));
printList(g_fm.l);
FooPtrListIterator it2=g_fm.addFoo(new Foo(2));
printList(g_fm.l);
FooPtrListIterator it3=g_fm.addFoo(new Foo(3));
printList(g_fm.l);
(*it2)->killWhenConditionMet();
printList(g_fm.l);
}
So, the questions I have are:
1. If an object needs to delete itself, or have some other system delete it, in O(1), do I have to store an iterator to object, inside the object? If so, are there any gotchas to do with iterators becoming invalid due other container iterations?
Is there simply another way to do this?
As a side question, does anyone know why and of the 'push*' stl container operations don't return the resultant iterator, meaning one has to resort to 'insert*'.
Please, no answers that say "don't pre-optimise", it drives me nuts. ;) This is an architectural question.
C++ standard in its [list.modifiers] section says that any list insertion operation "does not affect the validity of iterators and references", and any removal operation "invalidates only the iterators and references to the erased elements". So keeping iterators around would be safe.
Keeping iterators inside the objects also seems sane. Especially if you don't call them iterators, but rather name like FooManagerHandlers, which are processed by removal function in an opaque way. Indeed, you do not store "iterators", you store "representatives" of objects in an organized structure. These representatives are used to define a position of an object inside that structure. This is a separate, quite a high-level concept, and there's nothing illogical in implementing it.
However, the point of using lists is not just O(1) insert/remove, but also keeping elements in an order. If you don't need any order, then you would probably find hash tables more useful.
The one problem I see with storing the iterator in the object is that you must be careful of deleting the object from some other iterator, as your objects destructor does not know where it was destroyed from, so you can end up with an invalid iterator in the destructor.
The reason that push* does not return an iterator is that it is the inverse of pop*, allowing you to treat your container as a stack, queue, or deque.