In the STL almost all containers have an erase function. The question I have is in a vector, the erase function returns an iterator pointing to the next element in the vector. The map container does not do this. Instead it returns a void. Anyone know why there is this inconsistancy?
See http://www.sgi.com/tech/stl/Map.html
Map has the important property that
inserting a new element into a map
does not invalidate iterators that
point to existing elements. Erasing an
element from a map also does not
invalidate any iterators, except, of
course, for iterators that actually
point to the element that is being
erased.
The reason for returning an iterator on erase is so that you can iterate over the list erasing elements as you go. If erasing an item doesn't invalidate existing iterators there is no need to do this.
erase returns an iterator in C++11. This is due to defect report 130:
Table 67 (23.1.1) says that container::erase(iterator) returns an iterator. Table 69 (23.1.2) says that in addition to this requirement, associative containers also say that container::erase(iterator) returns void. That's not an addition; it's a change to the requirements, which has the effect of making associative containers fail to meet the requirements for containers.
The standards committee accepted this:
the LWG agrees the return type should be iterator, not void. (Alex Stepanov agrees too.)
(LWG = Library Working Group).
The inconsistency is due to use. vector is a sequence having an ordering over the elements. While it's true that the elements in a map are also ordered according to some comparison criterion, this ordering is non-evident from the structure. There is no efficient way to get from one element to the next (efficient = constant time). In fact, to iterate over the map is quite expensive; either the creation of the iterator or the iterator itself involves a walk over the complete tree. This cannot be done in O(n), unless a stack is used, in which case the space required is no longer constant.
All in all, there simply is no cheap way of returning the “next” element after erasing. For sequences, there is a way.
Additionally, Rob is right. There's no need for the Map to return an iterator.
Just as an aside, the STL shipped with MS Visual Studio C++ (Dinkumware IIRC) provides a map implementation with an erase function returning an iterator to the next element.
They do note it's not standards conforming.
I have no idea if this is the answer, but one reason might be with the cost of locating the next element. Iterating through a map is inherently "slow".
Related
I am not entirely clear on whether unordered_map is allowed to perform a rehash when doing an erase()
It is pretty clear that rehashing can happen during insert() thus invalidating all iterators and references:
http://en.cppreference.com/w/cpp/container/unordered_map/insert
But erase() seems to preserve all iterators and references except for the ones erased:
http://en.cppreference.com/w/cpp/container/unordered_map/erase
However, that last page and the Standard, indicate that erase() worst execution time is O(size). What operation can take linear time to complete and not modify the container in a way it invalidates iterators?
This post suggests that iterators are invalidated during erasure:
http://kera.name/articles/2011/06/iterator-invalidation-rules-c0x/
I also read somewhere that a future proposal will allow rehashing on erase(). Is that true?
If indeed rehashing occurs, the old iterate and erase algorithm is wrong right?
If you have a very bad hash, or pathological data, all of the elements can end up in one bucket, making it an O(n) traversal to locate/remove the element.
It is perfectly legal to implement a std::unordered_map using a (singly-)linked-list for the elements: we can see that its iterator type is only required to fulfil the ForwardIterator concept. This makes a linear traversal necessary to remove an element within a bucket, even when passing the iterator.
This is the operation that may take O(n) time, not rehashing.
I am currently looking for container that provides some inserting (insert or push_back) and some removing (erase, pop_back is not sufficient) methods, and that does not invalidate iterators nor pointers when calling these two methods.
More clearly, I want a set of elements where I can add an element (I do not care where), and where I can remove any element (so I do care where). In addition, I would have external pointers to specific elements, and I want them to remain valid if I add or remove an element from the set.
There are, to my knowledge, two standard containers that answer my needs : set and list. However, generally speaking, I do not like to use such containers for such simple needs. Since a list is involving pointers internally and does not provide random access to its elements, I think it is not a good choice. A set has random access to its elements, but is also involving pointers, and the random access itself is not done in constant time. I think that a set would be a better solution than a list, but I have thought about something else.
What about a simple vector that does not try to keep the elements contiguous when an element is removed ? When removing an element in the middle of this container, its position would be empty, and nothing else would happen. This way, no iterator or pointer would be invalidated. Also, when adding an element, the container would search for an empty position, and use a simple push_back if there is no such hole.
Obviously, since push_back can invalidate iterators with a vector, I would use a deque as the basis of implementation. I would also use some sort of stack to keep track of the holes from removing elements. This way, adding, removing, and accessing an element would be done in constant time, in addition to satisfying my no-invalidation needs.
However, there is still one problem : when iterating over this container or simply accessing an element by index we would need to take the holes into account. And that is where the problems start to surpass the advantages.
Hence my question : what do you think about my idea for that container ?
More importantly, what would you use for my original problem, a set, a list or something else ?
Also, if you have a nice and clean solution to the last problem (iterating over my container), feel free to show me.
Requirement 1: inserting (insert or push back) and some removing (erase, and not only the last element)
Compliant candidates: deque, forward_list, list, map, multi_map, set, multiset, unordered_map, unordered_multimap, unordered_set, unordered_multiset, and vector
Eliminated candidates: array (no insert or push_back), queue, priority_queue and stack (no erase, only pop)
Requirement 2: does not invalidate iterators nor pointers when calling these two methods
Compliant candidates from also satifying requirement 1: forward_list, list, map, multi_map, set, multiset,
Eliminated on erase: deque (iterator remain valid only when erasing first element), and vector (all elements after the start of erasure)
Eliminated on insert: dequeue (in most of the cases), unordered_map, unordered_multimap, unordered_set and unordered_multiset (in case the growth requires a rehash) and vector (in case growth requires of reallocation)
Requirement 3: External pointers shall remain valid
Approximaletly the same result than for requirement 2. However unordered_map, unordered_multimap, unordered_set and unordered_multiset would satifsy this requirements because references to elements remain valid.
Requirement 4: Random access to elements
Complying candidates satisfying requirement 1 and 2: map and multimap
Almost complying candidates: set and multiset do not have a random access, but can satisfy it using member find() (O(logn))
Non compliant: list (although algorithm find() could provide a workaround in O(n)).
Answer to question 1: Which is better, a set or a list ?
It depends on your other requirements. In a set each value can only be stored once. If your elements are all unique , choose the set. If not you'd better opt for the list or the multiset.
I do not know the elements that you're storing, but is the whole element the search argument ? Or is there a key ? In the latter case, you'd really go for a map.
Answer to question 2: What about alternative container ?
I wouldn't opt for deque for your alternative.
If you can foresee the maximum number of elements, you could simply reserve enough capacity to avoid reallocation. (satisfying requirement 1, 3 and 4).
If you have an "empty element" to represent holes, you could then also satisfy requirement 2. In worst case you could opt for a vector sotring your element and an indicator if it's valid.
If such a maximum is not determinable, I'd rather go for a map which proves to be flexible and satisfy all your requirements.
Here is the problem I would like to solve: in C++, iterators for map, multimap, etc are missing two desirable features: (1) they can't be checked at run-time for validity, and (2) there is no operator< defined on them, which means that they can't be used as keys in another associative container. (I don't care whether the operator< has any relationship to key ordering; I just want there to be some < available at least for iterators to the same map.)
Here is a possible solution to this problem: convince map, multimap, etc to store their key/data pairs in a vector, and then have the iterators be a small struct that contain a pointer to the vector itself and a subscript index. Then two iterators, at least for the same container, could be compared (by comparing their subscript indices), and it would be possible to test at run time whether an iterator is valid.
Is this solution achievable in standard C++? In particular, could I define the 'Allocator' for the map class to actually put the items in a vector, and then define the Allocator::pointer type to be the small struct described in the last paragraph? How is the iterator for a map related to the Allocator::pointer type? Does the Allocator::pointer have to be an actual pointer, or can it be anything that supports a dereference operation?
UPDATE 2013-06-11: I'm not understanding the responses. If the (key,data) pairs are stored in a vector, then it is O(1) to obtain the items given the subscript, only slightly worse than if you had a direct pointer, so there is no change in the asymptotics. Why does a responder say map iterators are "not kept around"? The standard says that iterators remain valid as long as the item to which they refer is not deleted. As for the 'real problem': say I use a multimap for a symbol table (variable name->storage location; it is a multimap rather than map because the variables names in an inner scope may shadow variables with the same name), and say now I need a second data structure keyed by variables. The apparently easiest solution is to use as key for the second map an iterator to the specific instance of the variable's name in the first map, which would work if only iterators had an operator<.
I think not.
If you were somehow able to "convince" map to store its pairs in a vector, you would fundamentally change certain (at least two) guarantees on the map:
insert, erase and find would no longer be logarithmic in complexity.
insert would no longer be able to guarantee the validity of unaffected iterators, as the underlying vector would sometimes need to be reallocated.
Taking a step back though, two things suggest to me that you are trying to "solve" the wrong problem.
First, it is unusual to need to have a vector of iterators.
Second, it is unusual to need to check an iterator for validity, as iterators are not generally kept around.
I wonder what the real problem is that you are trying to solve?
I would like to track a pointer to a list element for the next read access. This pointer would be advanced every time the list is read. Would it be bad practice to cache an iterator to the list as a member variable, assuming I take proper precautions when deleting list elements?
An iterator to a list element remains valid until the element it refers to is removed. This is guaranteed by the standard.
There is no problem with caching an iterator as long as you make sure you don't remove elements from the list or refresh the cached iterator when you do.
Iterators are meant to be used. They aren't just for looping over each element in a for-loop. All you have to take into account is the rules for iterator invalidation.
std::vector iterators can be invalidated by any operation that inserts elements into the list. The iterators for all elements beyond the point of insertion are invalidated, and all iterators are invalidated if the insertion operation causes an increase in capacity. An operation that removes an element from the vector will invalidate any iterator after the point of removal.
std::deque iterators are invalidated by any operation that adds or removes elements from anywhere in the deque. So it's probably not a good idea to keep these around very long.
std::list, std::set, and std::map iterators are only invalidated by the specific removal of the particular element that the iterator refers to. These are the longest-lived of the iterator types.
As long as you keep these rules in mind, feel free to store these iterators all you want. It certainly isn't bad form to store std::list iterators, as long as you can be sure that that particular element isn't going anywhere.
The only way you're going to be able to properly advance though an STL std::list<T> in a platform and compiler independent way is through the use of a std::list<T>::iterator or std::list<T>::const_iterator, so that's really your only option unless you're planning on implementing your own linked-list. Per the standard, as others here have posted, an iterator to a std::list element will remain valid until that element is removed from the list.
The 3-argument form of list::splice() moves a single element from one list to the other. SGI's documentation explicitly states that all iterators, including the one pointing to the element being moved remain valid. Roguewave's documentation does not say anything about iterator invalidation properties of splice() methods, whereas the C++ standard explicitly states that it invalidates all iterators and references to the element being spliced.
splicing() in practice works as defined by SGI, but I get assertion failure (dereferencing invalid iterator) in debug / secure SCL versions of microsoft's STL implementation (which strictly follows the letter of the standard).
Now, I'm using list exactly because I want to move an element between lists, while preserving the validity of the iterator pointing to it. The standard has made an extremely unhelpful change to the original SGI's specification.
How can I work around this problem? Or should I just be pragmatic and stick my head in the sand (because the splicing does not invalidate iterators in practice -- not even in the MS's implementation, once iterator debugging is turned off).
Ok, this seems to be a defect in the standard, according to this and this link. It seems that "sticking the head in the sand" is a good strategy, since it will be fixed in new library versions.
The problem is that if the iterator still points to the element that was moved, then the "end" iterator previously associated with the "moved" iterator has changed. Unless you write some complex loop, this is actually a bad thing to do -- especially since it will be more difficult for other developers to understand.
A better way in my opinion is to use the iterators pointing to the elements prior and after the moved iterator.
I have an array of lists (equivalence classes of elements), and I'm using splice to move elements between the lists. I have an additional array of iterators which gives me direct access to any element in any of the lists and to move it to another list. None of the lists is searched and modified at the same time. I could reinitialize the element iterator after splice, but it's kinda ugly.. I guess I'll do that for the time being.