Don't understand results of std::remove in C++ STL - c++

I was reading Josuttis "The C++ Standard Library, 2nd ed.". In section 6.7.1 author explains that the code given below will give unexpected results. I still don't how std::remove() functions, and why I am getting this strange result. (Though I understood that you need to use std::erase() in order to actually remove elements, and it is actually better to use list::erase() rather than combination of std::remove() & `std::remove()).
list<int> coll;
// insert elements from 6 to 1 and 1 to 6
for (int i=1; i<=6; ++i) {
coll.push_front(i);
coll.push_back(i);
}
// print
copy (coll.cbegin(), coll.cend(), // source
ostream_iterator<int>(cout," ")); // destination
cout << endl;
// remove all elements with value 3
remove (coll.begin(), coll.end(), // range
3); // value
// print (same as above)
and the results are
pre: 6 5 4 3 2 1 1 2 3 4 5 6
post: 6 5 4 2 1 1 2 4 5 6 5 6 (???)

This explanation should help:
Removing is done by shifting the elements in the range in such a way
that elements to be erased are overwritten. Relative order of the
elements that remain is preserved and the physical size of the
container is unchanged. Iterators pointing to an element between the
new logical end and the physical end of the range are still
dereferenceable, but the elements themselves have unspecified values.
A call to remove is typically followed by a call to a container's
erase method, which erases the unspecified values and reduces the
physical size of the container to match its new logical size.
Note that the return value from std::remove() is the iterator that represents the new end. Therefore, calling std::erase() on this new end and the old end will free your excess space.

std::remove doesn't actually shorten the list. It can't - as it only gets iterators and not the container itself.
What it does is copies the remaining values so that you get them in the beginning of the container. But the final elements of the container (in your case - the last two: '5' and '6') are actually still there..
After using std::remove you have to shorten to container yourself to remove the remaining "junk" copies.

You asked the algorithm to remove "3" element. So, while enumerating the container the algo shifts the content if something is removed from the middle. Such shift occurs 2 times in your case, this is why you see "5 6" elements at the end (because actual end was moved to 2 items forward). Then, "std::erase" will fix the issue with tail zombies.

To quote from everyone's favorite c++ website:
The function cannot alter the properties of the object containing the
range of elements (i.e., it cannot alter the size of an array or a
container): The removal is done by replacing the elements that compare
equal to val by the next element that does not, and signaling the new
size of the shortened range by returning an iterator to the element
that should be considered its new past-the-end element.
So std::remove doesn't change the size of the list. It removes the matching elements and returns you an iterator that represents the new end of the list. To actually erase the extraneous elements, you then need to do:
auto it = remove(coll.begin(), coll.end(), 3);
coll.erase(it, coll.end());

Related

Using remove and then erase to delete elements of container

I am writing this as I am very confused. Am I dreaming or have I always been wrong to what remove does?
Suppose we have a vector of integers:
std::vector<int> v {1, 2, 3, 1, 4, 1}
We want to remove all the number 1 from the vector so:
auto it {std::remove(v.begin(), v.end(), 1)};
Now our container will look like this:
2 3 4 1 1 1 with the iterator it pointing on index 3
After that we use this:
v.erase(it, v.end()); to remove all the elements from the iterator to the end of the vector, corresponding to all the 1s. Isnt this how you remove the elements or am I delusional?
What I was surprised about is that we dont need to add erase. Remove will remove all elements. My professor told me to use erase from the iterator to the end of the vector to delete all the elements after performing remove.
Isnt this how you remove the elements
Yes, remove and erase are used as you described in paragraphs preceding the last one.
Now our container will look like this: 2 3 4 1 1 1
Not quite. The container will look like this: 2 3 4 X X X. Where values X are unspecified. The algorithm doesn't need to bother overwriting the elements in the "removed" partition.
What I was surprised about is that we dont need to add erase.
You don't need to erase if it's fine for the residual elements to remain in the vector. If you do need to erase the elements, you can achieve that by erasing them.
Remove will remove all elements.
As per your earlier description, remove will only move the non-removed elements to the left partition. It will not erase any elements from the container. The size of the container won't change.
P.S. instead of remove-erase idiom, you can use std::erase:
std::erase(v, 1);

Understanding vector::assign example on cplusplus

I am confused about the following code and what it does:
first.assign (7,100); // 7 ints with a value of 100
std::vector<int>::iterator it;
it=first.begin()+1;
second.assign (it,first.end()-1); // the 5 central values of first
I don't understand the second.assign statement. I would assume it assigns 100 elements in second with a value of 100. Why is the size of second 5?
In the example code
it = vec.begin()+1 meaning 2nd element
And
second.assign (it,first.end()-1);
^^^^^^^^^^
One past the last element.
it has skipped the first and last elements and hence you have 7-2=5 elements in the last assignment.
There are 2 overloads of assign (3 in C++11).
First assign uses the new contents are n elements, each initialized to a copy of val.
2nd assign uses the new contents are elements constructed from each of the elements in the range between first and last, in the same order.
Therefore the 2nd assign copies first from the 2nd element to the penultimate element.

How to shuffle an array so that all elements change their place

I need to shuffle an array so that all array elements should change their location.
Given an array [0,1,2,3] it would be ok to get [1,0,3,2] or [3,2,0,1] but not [3,1,2,0] (because 2 left unchanged).
I suppose algorithm would not be language-specific, but just in case, I need it in C++ program (and I cannot use std::random_shuffle due to the additional requirement).
What about this?
Allocate an array which contains numbers from 0 to arrayLength-1
Shuffle the array
If there is no element in array whose index equals its value, continue to step 4; otherwise repeat from step 2.
Use shuffled array values as indexes for your array.
For each element e
If there is an element to the left of e
Select a random element r to the left of e
swap r and e
This guarantees that each value isn't in the position that it started, but doesn't guarantee that each value changes if there's duplicates.
BeeOnRope notes that though simple, this is flawed. Given the list [0,1,2,3], this algorithm cannot produce the output [1,0,3,2].
It's not going to be very random, but you can rotate all the elements at least one position:
std::rotate(v.begin(), v.begin() + (rand() % v.size() - 1) + 1, v.end());
If v was {1,2,3,4,5,6,7,8,9} at the beginning, then after rotation it will be, for example: {2,3,4,5,6,7,8,9,1}, or {3,4,5,6,7,8,9,1,2}, etc.
All elements of the array will change position.
I kind of have a idea in my mind hope it fits your application. Have one more container and this container will be
a "map(int,vector(int))" . The key element will show index and the second element the vector will hold the already used values.
For example for the first element you will use rand function to find which element of the array you should use.Than you will check the map structure if this element of the array has been used for this index.

Why is erase() function so expensive?

Consider a 2d vector vector < vector <int> > Nand lets say its contents are as follows:
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
So the size of N here is 4 i.e. N.size() = 4
Now, consider the following code :
int i = 0;
while(N != empty()){
N.erase(i);
++i;
}
I calculated the time just for this piece of code alone with various sizes for N and following are the results:
The size of N is 1000
Execution Time: 0.230000s
The size of N is 10000
Execution Time: 22.900000s
The size of N is 20000
Execution Time: 91.760000s
The size of N is 30000
Execution Time: 206.620000s
The size of N is 47895
Execution Time: 526.540000s
My question is why is this function so expensive ? If it is so then conditional erase statements in many programs could take forever just because of this function. It is the same case when I use erase function in std::map too. Is there any alternative for this function. Does other libraries like Boost offer any?
Please do not say I could do N.erase() as a whole because I'm just trying to analyze this function.
Consider what happens when you delete the first element of a vector. The rest of the vector must be "moved" down by one index, which involves copying it. Try erasing from the other end, and see if that makes a difference (I suspect it will...)
Because your algorithm is O(n^2). Each call to erase forces the vector to move all elements after the erased element back. So in your loop with the 4 element vector, the first loop causes 3 elements to be shifted, the second iteration causes 1 element to be shifted, and after that you have undefined behavior.
If you had 8 elements, the first iteration would move 7 elements, the next would move 5 elements, the next would move 3 elements, and the final enumeration would move 1 element. (And again you have undefined behavior)
When you encounter situations like this, generally you should use the standard algorithms (i.e. std::remove, std::remove_if) instead, as they run through the container once and turn typical O(n^2) algorithms into O(n) algorithms. For more information see Scott Meyers' "Effective STL" Item 43: Prefer Algorithm Calls to Explicit Loops.
A std::vector is, internally, just an array of elements. If you delete an element in the middle, all the elements after it have to be shifted down. This can be very expensive - even more so if the elements have a custom operator= that does a lot of work!
If you need erase() to be fast, you should use a std::list - this will use a doubly linked list structure that allows fast erasure from the middle (however, other operations get somewhat slower). If you just need to remove from the start of the list quickly, use std::deque - this creates a linked list of arrays, and offers most of the speed advantages of std::vector while still allowing fast erasures from the beginning or end only.
Furthermore, note that your loop there makes the problem worse - you first scan through all elements equal to zero and erase them. The scan takes O(n) time, the erasure also O(n) time. You then repeat for 1, and so on - overall, O(n^2) time. If you need to erase multiple values, you should take an iterator and go through the std::list yourself, using the iterator variant of erase(). Or if you use a vector, you'll find it can be faster to copy into a new vector.
As for std::map (and std::set) - this isn't a problem at all. std::map is capable of both removing elements at random, as well as searching for elements at random, with O(lg n) time - which is quite reasonable for most uses. Even your naive loop there shouldn't be too bad; manually iterating through and removing everything you want to remove in one pass is somewhat more efficient, but not nearly to the extent that it is with std::list and friends.
vector.erase will advance all elements after i forward by 1. This is an O(n) operation.
Additionally, you're passing vectors by value rather than by reference.
Your code also doesn't erase the entire vector.
For example:
i = 0
erase N[0]
N = {{2, 2, 2, 2}, {3, 3, 3, 3}, {4, 4, 4, 4}}
i = 1
erase N[1]
N = {{2, 2, 2, 2}, {4, 4, 4, 4}}
i = 2
erase N[2] nothing happens because the maximum index is N[1]
Lastly, I don' think that's the correct syntax for vector.erase(). You need to pass in an iterator to the begin location to erase the element you want.
Try this:
vector&ltvector&ltint&gt&gt vectors; // still passing by value so it'll be slow, but at least erases everything
for(int i = 0; i &lt 1000; ++i)
{
vector&ltint&gt temp;
for(int j = 0; j &lt 1000; ++j)
{
temp.push_back(i);
}
vectors.push_back(temp);
}
// erase starting from the beginning
while(!vectors.empty())
{
vectors.erase(vectors.begin());
}
You can also compare this to erasing from the end (it should be significantly faster, especially when using values rather than references):
// just replace the while-loop at the end
while(!vectors.empty())
{
vectors.erase(vectors.end()-1);
}
A vector is an array that grows automatically as you add elements to it. As such, elements in a vector a contiguous in memory. This allows constant time access to an element. Because they grow from the end, they also take amortized constant time to add or remove to/from the end.
Now, what happens when you remove in the middle? Well, it means whatever exists after the erased element must be shifted back one position. This is very expensive.
If you want to do lots of insertion/removal in the middle, use a linked list such as std::list of std::deque.
As Oli said, erasing from the first element of a vector means the elements following it have to be copied down in order for the array to behave as desired.
This is why linked lists are used for situations in which elements will be removed from random locations in the list - it is quicker (on larger lists) because there is no copying, only resetting some node pointers.

Weird behaviour with vector::erase and std::remove_if with end range different from vector.end()

I need to remove elements from the middle of a std::vector.
So I tried:
struct IsEven {
bool operator()(int ele)
{
return ele % 2 == 0;
}
};
int elements[] = {1, 2, 3, 4, 5, 6};
std::vector<int> ints(elements, elements+6);
std::vector<int>::iterator it = std::remove_if(ints.begin() + 2, ints.begin() + 4, IsEven());
ints.erase(it, ints.end());
After this I would expect that the ints vector have: [1, 2, 3, 5, 6].
In the debugger of Visual studio 2008, after the std::remove_if line, the elements of ints are modified, I'm guessing I'm into some sort of undefined behaviour here.
So, how do I remove elements from a Range of a vector?
Edit: Sorry, the original version of this was incorrect. Fixed.
Here's what's going on. Your input to remove_if is:
1 2 3 4 5 6
^ ^
begin end
And the remove_if algorithm looks at all numbers between begin and end (including begin, but excluding end), and removes all elements between that match your predicate. So after remove_if runs, your vector looks like this
1 2 3 ? 5 6
^ ^
begin new_end
Where ? is a value that I don't think is deterministic, although if it's guaranteed to be anything it would be 4. And new_end, which points to the new end of the input sequence you gave it, with the matching elements now removed, is what is returned by std::remove_if. Note that std::remove_if doesn't touch anything beyond the subsequence that you gave it. This might make more sense with a more extended example.
Say that this is your input:
1 2 3 4 5 6 7 8 9 10
^ ^
begin end
After std::remove_if, you get:
1 2 3 5 7 ? ? 8 9 10
^ ^
begin new_end
Think about this for a moment. What it has done is remove the 4 and the 6 from the subsequence, and then shift everything within the subsequence down to fill in the removed elements, and then moved the end iterator to the new end of the same subsequence. The goal is to satisfy the requirement that the (begin, new_end] sequence that it produces is the same as the (begin, end] subsequence that you passed in, but with certain elements removed. Anything at or beyond the end that you passed in is left untouched.
What you want to get rid of, then, is everything between the end iterator that was returned, and the original end iterator that you gave it. These are the ? "garbage" values. So your erase call should actually be:
ints.erase(it, ints.begin()+4);
The call to erase that you have just erases everything beyond the end of the subsequence that you performed the removal on, which isn't what you want here.
What makes this complicated is that the remove_if algorithm doesn't actually call erase() on the vector, or change the size of the vector at any point. It just shifts elements around and leaves some "garbage" elements after the end of the subsequence that you asked it to process. This seems silly, but the whole reason that the STL does it this way is to avoid the problem with invalidated iterators that doublep brought up (and to be able to run on things that aren't STL containers, like raw arrays).
Erasing elements in std::vector invalidates iterators past the removed element, so you cannot use "foreign" functions that accept ranges. You need to do that in a different way.
EDIT:
In general, you can use the fact that erasing one element "shifts" all elements at further positions one back. Something like this:
for (size_t scan = 2, end = 4; scan != end; )
{
if (/* some predicate on ints[scan] */)
{
ints.erase (ints.begin () + scan);
--end;
}
else
++scan;
}
Note that std::vector isn't suited for erasing elements in the middle. You should consider something else (std::list?) if you do that often.
EDIT 2:
As clarified by comments, first paragraph is not true. In such case std::remove_if should be more efficient than what I suggested in the first edit, so disregard this answer. (Keeping it for the comments.)
The behavior isn't weird - you're erasing the wrong range. std::remove_if moves elements it "removes" to the end of the input range. In this case, what you're looking for would be to do:
ints.erase(it, ints.begin() + 4 /* your end of range */);
From C++ in a Nutshell:
The remove_if function template
"removes" items for which pred returns
false from the range [first, last).
The return value is one past the new
end of the range. The relative order
of items that are not removed is
stable.
Nothing is actually erased from the
underlying container; instead, items
to the right are assigned to new
positions so they overwrite the
elements for which pred returns false.
See Figure 13-13 (under remove_copy)
for an example of the removal process.