Should I return an iterator or a pointer to an element in a STL container? - c++

I am developing an engine for porting existing code to a different platform. The existing code has been developed using a third party API, and my engine will redefine those third party API functions in terms of my new platform.
The following definitions come from the API:
typedef unsigned long shape_handle;
shape_handle make_new_shape( int type );
I need to redefine make_new_shape and I have the option to redefine shape_handle.
I have defined this structure ( simplified ):
struct Shape
{
int type
};
The Caller of make_new_shape doesn't care about the underlying structure of Shape, it just needs a "handle" to it so that it can call functions like:
void `set_shape_color( myshape, RED );`
where myshape is the handle to the shape.
My engine will manage the memory for the Shape objects and other requirements dictate that the engine should be storing Shape objects in a list or other iterable container.
My question is, what is the safest way to represent this handle - if the Shape itself is going to be stored in a std::list - an iterator, a pointer, an index?

Both an iterators or a pointers will do bad stuff if you try to access them after the object has been deleted so neither is intrinsically safer. The advantage of an iterator is that it can be used to access other members of your collection.
So, if you just want to access your Shape then a pointer will be simplest. If you want to iterate through your list then use an iterator.
An index is useless in a list since std::list does not overload the [] operator.

The answer depends on your representation:
for std::list, use an iterator (not a pointer), because an iterator allows you to remove the element without walking the whole list.
for std::map or boost::unordered_map, use the Key (of course)
Your design would be much strong if you used an associative container, because associative containers give you the ability to query for the presence of the object, rather than invoking Undefined Behavior.
Try benchmarking both map and unordered_map to see which one is faster in your case :)

IIF the internal representation will be a list of Shapes, then pointers and iterators are safe. Once an element is allocated, no relocation will ever occur. I wouldn't recommend an index for obvious access performance reasons. O(n) in case of lists.
If you were using a vector, then don't use iterators or pointers, because elements can be relocated when you exceed the vectors capacity, and your pointers/iterators would become invalid.
If you want a representation that is safe regardless of the internal container, then create a container (list/vector) of pointers to your shapes, and return the shape pointer to your client. Even if the container is moved around in memory, the Shape objects will stay in the same location.

Iterators aren't safer than pointers, but they have much better diagnostics than raw pointers if you're using a checked STL implementation!
For example, in a debug build, if you return a pointer to a list element, then erase that list element, you have a dangling pointer. If you access it you get a crash and all you can see is junk data. That can make it difficult to work out what went wrong.
If you use an iterator and you have a checked STL implementation, as soon as you access the iterator to an erased element, you get a message something like "iterator was invalidated". That's because you erased the element it points to. Boom, you just saved yourself potentially a whole lot of debugging effort.
So, not indices for O(n) performance. Between pointers and iterators - always iterators!

Related

How valid positions in vector::insert()? [duplicate]

This question is related with item 16 of effective stl book which states that while using vector(lets assume vector<int>vec) instead of array in a legacy code we must use &vec[0] instead of vec.begin() :
void doSomething(const int* pInts, size_t numlnts);
dosomething(&vec[0],vec.size()); \\correct!!
dosomething(vec.begin(),vec.size()); \\ wrong!! why???
The book states that vec.begin() is not same as &vec[0] . Why ? What the difference between the two ?
A std::vector is sequence container that encapsulates dynamic size arrays. This lets you conveniently store a bunch of elements without needing to be as concerned with managing the underlying array that is the storage for your elements. A large part of the convenience of using these classes comes from the fact that they give you a bunch of methods that let you deal with the sequence without needing to deal with raw pointers, an iterator is an example of this.
&vec[0] is a pointer to the first element of the underlying storage that the vector is using. vec.begin() is an iterator that starts at the beginning of the vector. While both of these give you a way to access the elements in the sequence these are 2 distinct concepts. Search up iterators to get a better idea of how this works.
If your code supports iterators its often easiest to use the iterators to iterate over the data. Part of the reasons for this is that iterators are not pointers, they let you iterate over the elements of the data structure without needing to know as much about the implementation details of the datastructure you are iterating over.
However sometimes you need the raw array of items, for example in some legacy API's or calls to C code you might need to pass a pointer to the array. In this case you have no choice but to extract the raw array from the vector, you can do this using something such as &vec[0]. Note that if you have c++11 support there's an explicit way to do this with std::vector::data which will give you access to the underlying storage array. The c++11 way has the additional benefit of also more clearly stating your intent to the people reading your code.
Formally, one produces an iterator, and the other a pointer, but I think the major difference is that vec[0] will do bad stuff if the vector is empty, while vec.begin() will not.
vec.begin() has type std::vector<int>::iterator. &vec[0] has type pointer to std::vector<int>::value_type. These are not necessarily the same type.
It is possible that a given implementation uses pointers as the iterator implementation for a vector, but this is not guaranteed, and thus you should not rely on that assumption. In fact most implementations do provide a wrapping iterator type.
Regarding your question about pointers being iterators, this is partly true. Pointers do meet the criteria of a random access iterator, but not all iterators are pointers. And there are iterators that do not support the random access behavior of pointers.

std::map without parent pointers?

libstdc++, as an example, implements std::map using a red-black binary tree with parent pointers in the nodes. This means that iterators can just be pointers to a node.
Is it possible for a standard library to implement std::map without storing parent pointers in the nodes? I think this would mean that iterators would need to contain a stack of parent pointers, and as such would need to dynamically allocate a logarithmic amount of memory. Would this violate standard performance constraints on iterators? Would not having parent pointers violate any other performance contraints on the rest of the interface?
What about the new node stuff/interface in C++17?
They may not do so. std::map guarantees that removing a key-value pair from it won't invalidate any iterators other than to the pair being removed.
If iterators will store a stack of parents, and a parent is removed, that will invalidate those iterators as well. And the guarantee will no longer hold.
Is it possible? Possibly :-) Is it a good idea? Almost certainly not. Most things are possible, if you throw more storage or speed at them :-)
In terms of just getting rid of the parent pointers, you could, for example, maintain within the map a monotonic value that is incremented each time the map structure is changed. In essence, it's a version identifier of the map structure. So, adding or deleting elements in the map increments this value, while merely changing the data within the map does not.
The iterator would then contain:
a pointer to the map itself (to get the current version);
the stack of pointers; and
the version matching the last time the stack above was created.
The idea would basically be to, before doing anything with the iterator, detect when the map version is different to the iterator one and, if it is, rebuild the stack and update the iterator version before carrying on with whatever operation you're trying to perform.
Now, while that makes it possible to iterate without parent pointers, it unfortunately violates some other requirements of iterators, such as being able to action them in constant time. Anything that has to rebuild a data structure, based on the data within the map, will violate that restriction.
In any case, there's no way anyone in their right mind would implement such a horrid scheme when it's far simpler to have parent pointers, but the intent here is simply to show that it's possible.
Hence my advice would be to just stick with the parent pointers. The use of such parent pointers makes the process of finding the next/previous element a rather simple one, based only the current item in the iterator.

When the data structure is a template parameter, how can I tell if an operation will invalidate an iterator?

Specifically, I have a class which currently uses a vector and push_back. There is an element inside the vector I want to keep track of. Pushing back on the vector may invalidate the iterator, so I keep its index around. It's cheap to find the iterator again using the index. I can't reserve the vector as I don't know how many items will be inserted.
I've considered making the data structure a template parameter, and perhaps list may be used instead. In that case, finding an iterator from the index is not a trivial operation. Since pushing back on a list doesn't invalidate iterators to existing elements, I could just store this iterator instead.
But how can I code a generic class which handles both cases easily?
If I can find out whether push_back will invalidate the iterator, I could store the iterator and update it after each push_back by storing the distance from the beginning before the operation.
You should probably try to avoid this flexibility. Quote from Item 2 "Beware the illusion of container-independent code" from Effective STL by Scott Meyers:
Face the truth: it's not worth it. The different containers are
different, and they have strengths and weaknesses that vary in significant ways. They're not designed to be interchangeable, and
there's littel you can do to paper that over. If you try, you're
merely tempting fate, and fate doesn't like to be tempted.
If you really, positively, definitely have to maintain valid iterators, use std::list. If you also need to have random access, try Boost.MultiIndex (although you'll lose contiguous memory access).
If you look at the standard container adapators (std::stack, std::queue) you see that they support the intersection of the adaptable containers interfaces, not their union.
I'd create a second class, which responsibility would be to return the iterator you are interested in.
It should also be parametrized with the same template parameter, and then you can specialize it for any type you want (vector/list etc). So inside your specializations you can use any method you want.
So it's some traits-based solution.
If you really want to stick with the vector and have that functionality maybe take a look at
http://en.cppreference.com/w/cpp/container/vector/capacity function. wrap your push_backs in defined function or even better wrap whole std::vector in ur class and before push_backing compare capacity against size() to check if resize will happen.

Why is vector::iterator invalidated upon reallocation?

I don't understand why a vector's iterator should be invalidated when a reallocation happens.
Couldn't this have been prevented simply by storing an offset -- instead of a pointer -- in the iterator?
Why was vector not designed this way?
Just to add a citation to the performance-related justification: when designing C++, Stroustrup thought it was vital that template classes like std::vector approach the performance characteristics of native arrays:
One reason for the emphasis on run-time efficiency...was that I wanted
templates to be efficient enough in time and space to be used for
low-level types such as arrays and lists.
...
Higher-level alternatives -- say, a range-checked array with a size()
operation, a multidimensional array, a vector type with proper numeric
vector operations and copy semantics, etc. -- would be accepted by
users only if their run-time, space, and notational convenience
approached those of built-in arrays.
In other words, the language mechanism supplying parameterized types
should be such that a concerned user should be able to afford to
eliminate the use of arrays in favor of a standard library class.
Bjarne Stroustrup, Design and Evolution of C++, p.342.
Because for iterators to do that, they'd need to store a pointer to the vector object. For each data access, they'd need to follow the pointer to the vector, then follow the pointer therein to the current location of the data array, then add the offset * the element size. That'd be much slower, and need more memory for the size_type member.
Certainly, it's a good compromise sometimes and it would be nice to be able to choose it when wanted, but it's slower and bulkier than (C-style) direct array usage. std::vector was ruthlessly scrutinised for performance when the STL was being introduced, and the normal implementation is optimised for space and speed over this convenience/reliability factor, just as the array-equivalent operator[] is as fast as arrays but less safe than at().
You can add safety by wrapping the standard std::vector<T>::iterator, but you can't add speed by wrapping a extension::vector<T>::safe_iterator. That's a general principle, and explains many C++ design choices.
There are many reasons for these decisions. As others pointed out, the most basic implementation of iterator for a vector is a plain pointer to the element. To be able to handle push_back iterators would have to be modified to handle a pointer into the vector and a position, on access through the operator, the vector pointer would ave to be dereferenced, the pointer to the data obtained and the position added, with an extra dereference.
While that would not be the most efficient implementation, that is not really a limiting factor. The default implementation of iterators in VS/Dinkumware libraries (even in release) are checked iterators, that manage an equivalent amount of information.
The actual problem comes with other mutating operations. Consider inserting/erasing in the middle of the vector. To maintain validity of all iterators, the container would have to track all the instances of iterators and adapt the position field so that they still refer to the same element (that has been displaced by the insertion/removal).
You would need to store both the offset and a pointer to the vector object itself.
As specified, the iterator can just be a pointer, which takes less space.
TL;DR -- because you're trading simple rules for invalidation for far more complicated action-at-a-distance ones.
Please note that "store a pointer to the vector object" would cause new invalidation cases. For example, today swap preserves iterator validity, if a pointer (or reference) to the vector is stored inside iterators, it no longer could. All operations that move the vector metadata itself (vector-of-vectors anyone?) would invalidate iterators.
You trade is "iterator becomes invalid when a pointer/reference to the element is invalidated" for "iterator becomes invalid when a pointer/reference to the vector is invalidated".
The performance arguments don't much matter, because the proposed alternate implementation is not even correct.
I an iterator wasn't invalidated, should it point to the same element or to the same position after an insertion before it? In other words, even if there were no performance issues, it is non-trivial to decide which alternative definition to use.

Storing iterators inside containers

I am building a DLL that another application would use. I want to store the current state of some data globally in the DLL's memory before returning from the function call so that I could reuse state on the next call to the function.
For doing this, I'm having to save some iterators. I'm using a std::stack to store all other data, but I wasn't sure if I could do that with the iterators also.
Is it safe to put list iterators inside container classes? If not, could you suggest a way to store a pointer to an element in a list so that I can use it later?
I know using a vector to store my data instead of a list would have allowed me to store the subscript and reuse it very easily, but unfortunately I'm having to use only an std::list.
Iterators to list are invalidated only if the list is destroyed or the "pointed" element is removed from the list.
Yes, it'll work fine.
Since so many other answers go on about this being a special quality of list iterators, I have to point out that it'd work with any iterators, including vector ones. The fact that vector iterators get invalidated if the vector is modified is hardly relevant to a question of whether it is legal to store iterators in another container -- it is. Of course the iterator can get invalidated if you do anything that invalidates it, but that has nothing to do with whether or not the iterator is stored in a stack (or any other data structure).
It should be no problem to store the iterators, just make sure you don't use them on a copy of the list -- an iterator is bound to one instance of the list, and cannot be used on a copy.
That is, if you do:
std::list<int>::iterator it = myList.begin ();
std::list<int> c = myList;
c.insert (it, ...); // Error
As noted by others: Of course, you should also not invalidate the iterator by removing the pointed-to element.
This might be offtopic, but just a hint...
Be aware, that your function(s)/data structure would probably be thread unsafe for read operations. There is a kind of basic thread safety where read operations do not require synchronization. If you are going to store the sate how much the caller read from your structure it will make the whole concept thread unsafe and a bit unnatural to use. Because nobody assumes a read to be state-full operation.
If two threads are going to call it they will either need to synchronize the calls or your data structure might end-up in a race condition. The problem in such a design is that both threads must have access to a common synchronization variable.
I would suggest making two overloaded functions. Both are stateless, but one of them should accept a hint iterator, where to start next read/search/retrieval etc. This is e.g. how Allocator in STL is implemented. You can pass to allocator a hint pointer (default 0) so that it quicker finds a new memory chunk.
Regards,
Ovanes
Storing the iterator for the list should be fine. It will not get invalidated unless you remove the same element from the list for which you have stored the iterator. Following quote from SGI site:
Lists have the important property that
insertion and splicing do not
invalidate iterators to list elements,
and that even removal invalidates only
the iterators that point to the
elements that are removed
However, note that the previous and next element of the stored iterator may change. But the iterator itself will remain valid.
The same rule applies to an iterator stored in a local variable as in a longer lived data structure: it will stay valid as long as the container allows.
For a list, this means: as long as the node it points to is not deleted, the iterator stays valid. Obviously the node gets deleted when the list is destructed...