Erasing values from vector/map - c++

Is this a safe way of deleting all entries from my vector or map
Vector
my_vector.erase(my_vector.begin(),my_vector.end());
For example for a map
my_map.erase(my_map.begin(),my_map.end());
The map or vector structure contains elements whose de-allocation is taken care in the destructor of those elements
Does the iterator value returned by end() become invalid as it starts to erase elements?

Both erase() methods are designed to work for iterator ranges excluding the second one.
// erase elements in range [a,b)
Iterator erase(Iterator a, Iterator b);
So it is safe to call erase() as you do, but you might as well call clear() in both cases.

It's safe to call erase(begin, end) for std::vector/std::map, it also valid to other STL containers(list, set, deque etc) which provide erase member function and iterator to iterate through elements.
As long as you pass in valid range(beg,end), below two ranges are also valid, erase takes no efforts:
c.erase(c.begin(), c.begin());
c.erase(c.end(), clend());
std::vector::erase(beg,end) Removes all elements of the range [beg,end) and returns the
position of the next element.
std::map::erase(beg,end) Removes all elements of the range [beg,end) and returns
the following position (returned nothing before C++11).
In STL internal implementation, it calls erase(begin,end) in a few functions like:
void clear() noexcept;
Effects: Behaves as if the function calls:
erase(begin(), end());
void assign(size_type n, const T& t);
Effects:
erase(begin(), end());
insert(begin(), first, last);
As you can see, erase(begin(),end()); is the same as clear().
Alternatively, you could call swap to clear a STL container which is suggested in More Effective STL:
vector<Contestant> v;
vector<Contestant>().swap(v); //clear v and minimize its capacity

Related

Control flow with iterators

Say I have something like this:
void myFunk(std::vector<T>& v, std::vector<T>::iterator first, std::vector<T>::iterator last) {
while (first != last) {
if ((*first) > (*last)) {
T someT;
v.push_back(someT);
}
first++;
}
}
int main(){
std::vector<T> foo = {some, T, values};
myFunky(foo, foo.begin(), foo.end())
return 0;
}
Would this lead to an infinite loop, or would it end after foo.size() iterations? In other words, would the last iterator be updated as foo grew, or would it retain the value given in the function call?
I'm assuming last would change, since it's a pointer to a position, but would like some confirmation.
Would this lead to an infinite loop, or would it end after foo.size() iterations?
Neither. What you are doing is undefined behavior, for a couple of reasons:
You are modifying the vector while iterating through it.
If the vector reallocates its internal storage when pushing a new item, all existing iterators into the vector are invalidated, including both iterators you are using to loop with. But even just pushing a new item always invalidates the end() iterator, at least.
See Iterator invalidation rules for C++ containers
You are dereferencing the end() iterator, which never refers to a valid element.
I'm assuming last would change, since it's a pointer to a position
It can't change, since you passed it into the myFunc function by value, so it is a copy of the original end() iterator. If end() changes value, last will not change value, since it is a copy.
In any case, iterators are not necessarily implemented as pointers, but pointers are valid iterators. But it doesn't matter in this case. Even if vector::iterator were just a simple pointer, last would still get invalidated upon every push/reallocation.

What does"one-past-the-last-element" mean in vectors?

I'm learning vectors and am confused on how the array is copying to thevector here
double p[] = {1, 2, 3, 4, 5};
std::vector<double> a(p, p+5);
I also know std::vector<double> a(3,5); means `make room for 3 and initialize them with 5. How does the above code work?
The second point is I read the paragraph from where I copied the above code.
Understanding the second point is crucial when working with vectors or
any other standard containers. The controlled sequence is always
expressed in terms of [first, one-past-last)—not only for ctors, but
also for every function that operates on a range of elements.
I don't know what is the meant by [first, one-past-last) ?
I know mathematically but don't know why/how vector copy the array in this way?
Edited
another related question
The member function end() returns an iterator that "points" to
one-past-the-last-element in the sequence. Note that dereferencing the
iterator returned by end() is illegal and has undefined results.
Can you explain this one-past-the-last-element what is it? and why?
Never dereference end() or rend() from STL containers, as they do not point to valid elements.
This picture below can help you visualize this.
The advantage of an half open range is:
1. Handling of empty ranges occur when begin() == end()
2. Iterating over the elements can be intuitively done by checking until the iterator equals end().
Strongly coupled with containers (e.g. vector, list, map) is the concept of iterators. An iterator is a C++ abstraction of a pointer. I.e. an iterator points to an object inside the container (or to one past the last element), and dereferencing the iterator means accessing that element.
Lets take for instance a vector of 4 elements:
| 0 | 1 | 2 | 3 | |
^ ^ ^
| | |
| | one past the end (outside of the container elements)
| last element
first element
The (algorithms in the) standard template library operate on ranges, rather than on containers. This way you can apply operations, not only to the entire container, but also to ranges (consecutive elements of the container).
A range is specified by [first, last) (inclusive first, exclusive last). That's why you need an iterator to one past the end: to specify a range equal to the entire contents of the container. But as that iterator points outside of it, it is illegal to dereference it.
The constructor of std::vector has several overloads.
For std::vector<double> a(3,5); the fill constructor is used :
explicit vector (size_type n);
vector (size_type n, const value_type& val,
const allocator_type& alloc = allocator_type());
This takes a size parameter as it's first parameter and an optional and third parameter, the second parameter specifies the value you want to give the newly created objects.
double p[] = {1, 2, 3, 4, 5};
std::vector<double> a(p, p+5);
Uses another overload of the constructor, namely the range constructor:
template <class InputIterator>
vector (InputIterator first, InputIterator last,
const allocator_type& alloc = allocator_type());
This takes an iterator to the start of a collection and the end() iterator and traverses and adds to the vector until first == last.
The reason why end() is implemented as one-past-the-last-element is because this allows for implementations to check for equality like:
while(first != last)
{
//savely add value of first to vector
++first;
}
Iterators are an abstraction of pointers.
A half-open interval [a,b) is defined as all elements x>=a and x<b. The advantage of it is that [a,a) is well defined and empty for any a.
Anything that can be incremented and compared equal can define a half open interval. So [ptr1,ptr2) is the element ptr1 then ptr1+1 then ptr1+2 until you reach ptr2, but not including ptr2.
For iterators, it is similar -- except we do not always have random access. So we talk about next instead of +1.
Pointers still count as a kind-of iterator.
A range of iterators or pointers "talks about" the elements pointed to. So when a vector takes a pair of iterators (first, and one-past-the-end), this defines a half-open interval of iterators, which also defines a collection of values they point to.
You can construct the vector from such a half-open range. It copies the elements poimtrd to into the vector.

vector::erase and reverse_iterator

I have a collection of elements in a std::vector that are sorted in a descending order starting from the first element. I have to use a vector because I need to have the elements in a contiguous chunk of memory. And I have a collection holding many instances of vectors with the described characteristics (always sorted in a descending order).
Now, sometimes, when I find out that I have too many elements in the greater collection (the one that holds these vectors), I discard the smallest elements from these vectors some way similar to this pseudo-code:
grand_collection: collection that holds these vectors
T: type argument of my vector
C: the type that is a member of T, that participates in the < comparison (this is what sorts data before they hit any of the vectors).
std::map<C, std::pair<T::const_reverse_iterator, std::vector<T>&>> what_to_delete;
iterate(it = grand_collection.begin() -> grand_collection.end())
{
iterate(vect_rit = it->rbegin() -> it->rend())
{
// ...
what_to_delete <- (vect_rit->C, pair(vect_rit, *it))
if (what_to_delete.size() > threshold)
what_to_delete.erase(what_to_delete.begin());
// ...
}
}
Now, after running this code, in what_to_delete I have a collection of iterators pointing to the original vectors that I want to remove from these vectors (overall smallest values). Remember, the original vectors are sorted before they hit this code, which means that for any what_to_delete[0 - n] there is no way that an iterator on position n - m would point to an element further from the beginning of the same vector than n, where m > 0.
When erasing elements from the original vectors, I have to convert a reverse_iterator to iterator. To do this, I rely on C++11's §24.4.1/1:
The relationship between reverse_iterator and iterator is
&*(reverse_iterator(i)) == &*(i- 1)
Which means that to delete a vect_rit, I use:
vector.erase(--vect_rit.base());
Now, according to C++11 standard §23.3.6.5/3:
iterator erase(const_iterator position); Effects: Invalidates
iterators and references at or after the point of the erase.
How does this work with reverse_iterators? Are reverse_iterators internally implemented with a reference to a vector's real beginning (vector[0]) and transforming that vect_rit to a classic iterator so then erasing would be safe? Or does reverse_iterator use rbegin() (which is vector[vector.size()]) as a reference point and deleting anything that is further from vector's 0-index would still invalidate my reverse iterator?
Edit:
Looks like reverse_iterator uses rbegin() as its reference point. Erasing elements the way I described was giving me errors about a non-deferenceable iterator after the first element was deleted. Whereas when storing classic iterators (converting to const_iterator) while inserting to what_to_delete worked correctly.
Now, for future reference, does The Standard specify what should be treated as a reference point in case of a random-access reverse_iterator? Or this is an implementation detail?
Thanks!
In the question you have already quoted exactly what the standard says a reverse_iterator is:
The relationship between reverse_iterator and iterator is &*(reverse_iterator(i)) == &*(i- 1)
Remember that a reverse_iterator is just an 'adaptor' on top of the underlying iterator (reverse_iterator::current). The 'reference point', as you put it, for a reverse_iterator is that wrapped iterator, current. All operations on the reverse_iterator really occur on that underlying iterator. You can obtain that iterator using the reverse_iterator::base() function.
If you erase --vect_rit.base(), you are in effect erasing --current, so current will be invalidated.
As a side note, the expression --vect_rit.base() might not always compile. If the iterator is actually just a raw pointer (as might be the case for a vector), then vect_rit.base() returns an rvalue (a prvalue in C++11 terms), so the pre-decrement operator won't work on it since that operator needs a modifiable lvalue. See "Item 28: Understand how to use a reverse_iterator's base iterator" in "Effective STL" by Scott Meyers. (an early version of the item can be found online in "Guideline 3" of http://www.drdobbs.com/three-guidelines-for-effective-iterator/184401406).
You can use the even uglier expression, (++vect_rit).base(), to avoid that problem. Or since you're dealing with a vector and random access iterators: vect_rit.base() - 1
Either way, vect_rit is invalidated by the erase because vect_rit.current is invalidated.
However, remember that vector::erase() returns a valid iterator to the new location of the element that followed the one that was just erased. You can use that to 're-synchronize' vect_rit:
vect_rit = vector_type::reverse_iterator( vector.erase(vect_rit.base() - 1));
From a standardese point of view (and I'll admit, I'm not an expert on the standard): From §24.5.1.1:
namespace std {
template <class Iterator>
class reverse_iterator ...
{
...
Iterator base() const; // explicit
...
protected:
Iterator current;
...
};
}
And from §24.5.1.3.3:
Iterator base() const; // explicit
Returns: current.
Thus it seems to me that so long as you don't erase anything in the vector before what one of your reverse_iterators points to, said reverse_iterator should remain valid.
Of course, given your description, there is one catch: if you have two contiguous elements in your vector that you end up wanting to delete, the fact that you vector.erase(--vector_rit.base()) means that you've invalidated the reverse_iterator "pointing" to the immediately preceeding element, and so your next vector.erase(...) is undefined behavior.
Just in case that's clear as mud, let me say that differently:
std::vector<T> v=...;
...
// it_1 and it_2 are contiguous
std::vector<T>::reverse_iterator it_1=v.rend();
std::vector<T>::reverse_iterator it_2=it_1;
--it_2;
// Erase everything after it_1's pointee:
// convert from reverse_iterator to iterator
std::vector<T>::iterator tmp_it=it_1.base();
// but that points one too far in, so decrement;
--tmp_it;
// of course, now tmp_it points at it_2's base:
assert(tmp_it == it_2.base());
// perform erasure
v.erase(tmp_it); // invalidates all iterators pointing at or past *tmp_it
// (like, say it_2.base()...)
// now delete it_2's pointee:
std::vector<T>::iterator tmp_it_2=it_2.base(); // note, invalid iterator!
// undefined behavior:
--tmp_it_2;
v.erase(tmp_it_2);
In practice, I suspect that you'll run into two possible implementations: more commonly, the underlying iterator will be little more than a (suitably wrapped) raw pointer, and so everything will work perfectly happily. Less commonly, the iterator might actually try to track invalidations/perform bounds checking (didn't Dinkumware STL do such things when compiled in debug mode at one point?), and just might yell at you.
The reverse_iterator, just like the normal iterator, points to a certain position in the vector. Implementation details are irrelevant, but if you must know, they both are (in a typical implementation) just plain old pointers inside. The difference is the direction. The reverse iterator has its + and - reversed w.r.t. the regular iterator (and also ++ and --, > and < etc).
This is interesting to know, but doesn't really imply an answer to the main question.
If you read the language carefully, it says:
Invalidates iterators and references at or after the point of the erase.
References do not have a built-in sense of direction. Hence, the language clearly refers to the container's own sense of direction. Positions after the point of the erase are those with higher indices. Hence, the iterator's direction is irrelevant here.

C++ STL container set&multiset:the insert operation with different return types

About the STL container set and multiset,the return types of the insert functions are not all the same.
set provides the following interface:
pair<iterator,bool> insert(const value_type& elem);
iterator insert(iterator pos_hint, const value_type& elem);
multiset provides the following interface:
iterator insert(const value_type& elem);
iterator insert(iterator pos_hint, const value_type& elem);
In the first function of set, the member second of the pair structure returns whether the insertion is successful.The insertion of an element might fail for a set if it already contains an element with the same value.But in the second function of set,the insert function just returns an iterator.What will happen if the insertion fails?Could someone tell me that?
Thanks a lot.
But in the second function of set,the insert function just returns an iterator.What will happen if the insertion fails?
In the set::insert version that just returns a plain iterator (and not a pair<iterator,bool>), when an existing element is found the set is left unchanged and the insert will return an iterator to the existing element (that prevented the insertion).
In multiset::insert, the function is always successful.
23.2.4 Associative Container Requirements
iterator a.insert(p, t)
Effects: Inserts t if and only if there is no element with key
equivalent to the key of t in containers with unique keys; always
inserts t in containers with equivalent keys. always returns the
iterator pointing to the element with key equivalent to the key of t.
t is inserted as close as possible to the position just prior to p.

Does pop_back() really invalidate *all* iterators on an std::vector?

std::vector<int> ints;
// ... fill ints with random values
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
{
*it = ints.back();
ints.pop_back();
continue;
}
it++;
}
This code is not working because when pop_back() is called, it is invalidated. But I don't find any doc talking about invalidation of iterators in std::vector::pop_back().
Do you have some links about that?
The call to pop_back() removes the last element in the vector and so the iterator to that element is invalidated. The pop_back() call does not invalidate iterators to items before the last element, only reallocation will do that. From Josuttis' "C++ Standard Library Reference":
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.
Here is your answer, directly from The Holy Standard:
23.2.4.2 A vector satisfies all of the requirements of a container and of a reversible container (given in two tables in 23.1) and of a sequence, including most of the optional sequence requirements (23.1.1).
23.1.1.12 Table 68
expressiona.pop_back()
return typevoid
operational semanticsa.erase(--a.end())
containervector, list, deque
Notice that a.pop_back is equivalent to a.erase(--a.end()). Looking at vector's specifics on erase:
23.2.4.3.3 - iterator erase(iterator position) - effects - Invalidates all the iterators and references after the point of the erase
Therefore, once you call pop_back, any iterators to the previously final element (which now no longer exists) are invalidated.
Looking at your code, the problem is that when you remove the final element and the list becomes empty, you still increment it and walk off the end of the list.
(I use the numbering scheme as used in the C++0x working draft, obtainable here
Table 94 at page 732 says that pop_back (if it exists in a sequence container) has the following effect:
{ iterator tmp = a.end();
--tmp;
a.erase(tmp); }
23.1.1, point 12 states that:
Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container
member function or passing a container as an argument to a library function shall not invalidate iterators to, or change
the values of, objects within that container.
Both accessing end() as applying prefix-- have no such effect, erase() however:
23.2.6.4 (concerning vector.erase() point 4):
Effects: Invalidates iterators and references at or after the point of the erase.
So in conclusion: pop_back() will only invalidate an iterator to the last element, per the standard.
Here is a quote from SGI's STL documentation (http://www.sgi.com/tech/stl/Vector.html):
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
I think it follows that pop_back only invalidates the iterator pointing at the last element and the end() iterator. We really need to see the data for which the code fails, as well as the manner in which it fails to decide what's going on. As far as I can tell, the code should work - the usual problem in such code is that removal of element and ++ on iterator happen in the same iteration, the way #mikhaild points out. However, in this code it's not the case: it++ does not happen when pop_back is called.
Something bad may still happen when it is pointing to the last element, and the last element is less than 10. We're now comparing an invalidated it and end(). It may still work, but no guarantees can be made.
Iterators are only invalidated on reallocation of storage. Google is your friend: see footnote 5.
Your code is not working for other reasons.
pop_back() invalidates only iterators that point to the last element. From C++ Standard Library Reference:
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.
So to answer your question, no it does not invalidate all iterators.
However, in your code example, it can invalidate it when it is pointing to the last element and the value is below 10. In which case Visual Studio debug STL will mark iterator as invalidated, and further check for it not being equal to end() will show an assert.
If iterators are implemented as pure pointers (as they would in probably all non-debug STL vector cases), your code should just work. If iterators are more than pointers, then your code does not handle this case of removing the last element correctly.
Error is that when "it" points to the last element of vector and if this element is less than 10, this last element is removed. And now "it" points to ints.end(), next "it++" moves pointer to ints.end()+1, so now "it" running away from ints.end(), and you got infinite loop scanning all your memory :).
The "official specification" is the C++ Standard. If you don't have access to a copy of C++03, you can get the latest draft of C++0x from the Committee's website: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2723.pdf
The "Operational Semantics" section of container requirements specifies that pop_back() is equivalent to { iterator i = end(); --i; erase(i); }. the [vector.modifiers] section for erase says "Effects: Invalidates iterators and references at or after the point of the erase."
If you want the intuition argument, pop_back is no-fail (since destruction of value_types in standard containers are not allowed to throw exceptions), so it cannot do any copy or allocation (since they can throw), which means that you can guess that the iterator to the erased element and the end iterator are invalidated, but the remainder are not.
pop_back() will only invalidate it if it was pointing to the last item in the vector. Your code will therefore fail whenever the last int in the vector is less than 10, as follows:
*it = ints.back(); // Set *it to the value it already has
ints.pop_back(); // Invalidate the iterator
continue; // Loop round and access the invalid iterator
You might want to consider using the return value of erase instead of swapping the back element to the deleted position an popping back. For sequences erase returns an iterator pointing the the element one beyond the element being deleted. Note that this method may cause more copying than your original algorithm.
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
it = ints.erase( it );
else
++it;
}
std::remove_if could also be an alternative solution.
struct LessThanTen { bool operator()( int n ) { return n < 10; } };
ints.erase( std::remove_if( ints.begin(), ints.end(), LessThanTen() ), ints.end() );
std::remove_if is (like my first algorithm) stable, so it may not be the most efficient way of doing this, but it is succinct.
Check out the information here (cplusplus.com):
Delete last element
Removes the last element in the vector, effectively reducing the vector size by one and invalidating all iterators and references to it.