Why is erase() function so expensive? - c++

Consider a 2d vector vector < vector <int> > Nand lets say its contents are as follows:
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
So the size of N here is 4 i.e. N.size() = 4
Now, consider the following code :
int i = 0;
while(N != empty()){
N.erase(i);
++i;
}
I calculated the time just for this piece of code alone with various sizes for N and following are the results:
The size of N is 1000
Execution Time: 0.230000s
The size of N is 10000
Execution Time: 22.900000s
The size of N is 20000
Execution Time: 91.760000s
The size of N is 30000
Execution Time: 206.620000s
The size of N is 47895
Execution Time: 526.540000s
My question is why is this function so expensive ? If it is so then conditional erase statements in many programs could take forever just because of this function. It is the same case when I use erase function in std::map too. Is there any alternative for this function. Does other libraries like Boost offer any?
Please do not say I could do N.erase() as a whole because I'm just trying to analyze this function.

Consider what happens when you delete the first element of a vector. The rest of the vector must be "moved" down by one index, which involves copying it. Try erasing from the other end, and see if that makes a difference (I suspect it will...)

Because your algorithm is O(n^2). Each call to erase forces the vector to move all elements after the erased element back. So in your loop with the 4 element vector, the first loop causes 3 elements to be shifted, the second iteration causes 1 element to be shifted, and after that you have undefined behavior.
If you had 8 elements, the first iteration would move 7 elements, the next would move 5 elements, the next would move 3 elements, and the final enumeration would move 1 element. (And again you have undefined behavior)
When you encounter situations like this, generally you should use the standard algorithms (i.e. std::remove, std::remove_if) instead, as they run through the container once and turn typical O(n^2) algorithms into O(n) algorithms. For more information see Scott Meyers' "Effective STL" Item 43: Prefer Algorithm Calls to Explicit Loops.

A std::vector is, internally, just an array of elements. If you delete an element in the middle, all the elements after it have to be shifted down. This can be very expensive - even more so if the elements have a custom operator= that does a lot of work!
If you need erase() to be fast, you should use a std::list - this will use a doubly linked list structure that allows fast erasure from the middle (however, other operations get somewhat slower). If you just need to remove from the start of the list quickly, use std::deque - this creates a linked list of arrays, and offers most of the speed advantages of std::vector while still allowing fast erasures from the beginning or end only.
Furthermore, note that your loop there makes the problem worse - you first scan through all elements equal to zero and erase them. The scan takes O(n) time, the erasure also O(n) time. You then repeat for 1, and so on - overall, O(n^2) time. If you need to erase multiple values, you should take an iterator and go through the std::list yourself, using the iterator variant of erase(). Or if you use a vector, you'll find it can be faster to copy into a new vector.
As for std::map (and std::set) - this isn't a problem at all. std::map is capable of both removing elements at random, as well as searching for elements at random, with O(lg n) time - which is quite reasonable for most uses. Even your naive loop there shouldn't be too bad; manually iterating through and removing everything you want to remove in one pass is somewhat more efficient, but not nearly to the extent that it is with std::list and friends.

vector.erase will advance all elements after i forward by 1. This is an O(n) operation.
Additionally, you're passing vectors by value rather than by reference.
Your code also doesn't erase the entire vector.
For example:
i = 0
erase N[0]
N = {{2, 2, 2, 2}, {3, 3, 3, 3}, {4, 4, 4, 4}}
i = 1
erase N[1]
N = {{2, 2, 2, 2}, {4, 4, 4, 4}}
i = 2
erase N[2] nothing happens because the maximum index is N[1]
Lastly, I don' think that's the correct syntax for vector.erase(). You need to pass in an iterator to the begin location to erase the element you want.
Try this:
vector&ltvector&ltint&gt&gt vectors; // still passing by value so it'll be slow, but at least erases everything
for(int i = 0; i &lt 1000; ++i)
{
vector&ltint&gt temp;
for(int j = 0; j &lt 1000; ++j)
{
temp.push_back(i);
}
vectors.push_back(temp);
}
// erase starting from the beginning
while(!vectors.empty())
{
vectors.erase(vectors.begin());
}
You can also compare this to erasing from the end (it should be significantly faster, especially when using values rather than references):
// just replace the while-loop at the end
while(!vectors.empty())
{
vectors.erase(vectors.end()-1);
}

A vector is an array that grows automatically as you add elements to it. As such, elements in a vector a contiguous in memory. This allows constant time access to an element. Because they grow from the end, they also take amortized constant time to add or remove to/from the end.
Now, what happens when you remove in the middle? Well, it means whatever exists after the erased element must be shifted back one position. This is very expensive.
If you want to do lots of insertion/removal in the middle, use a linked list such as std::list of std::deque.

As Oli said, erasing from the first element of a vector means the elements following it have to be copied down in order for the array to behave as desired.
This is why linked lists are used for situations in which elements will be removed from random locations in the list - it is quicker (on larger lists) because there is no copying, only resetting some node pointers.

Related

How to add an element to the front of a vector in C++? [duplicate]

iterator insert ( iterator position, const T& x );
Is the function declaration of the insert operator of the std::Vector class.
This function's return type is an iterator pointing to the inserted element. My question is, given this return type, what is the most efficient way (this is part of a larger program I am running where speed is of the essence, so I am looking for the most computationally efficient way) of inserting at the beginning. Is it the following?
//Code 1
vector<int> intvector;
vector<int>::iterator it;
it = myvector.begin();
for(int i = 1; i <= 100000; i++){
it = intvector.insert(it,i);
}
Or,
//Code 2
vector<int> intvector;
for(int i = 1; i <= 100000; i++){
intvector.insert(intvector.begin(),i);
}
Essentially, in Code 2, is the parameter,
intvector.begin()
"Costly" to evaluate computationally as compared to using the returned iterator in Code 1 or should both be equally cheap/costly?
If one of the critical needs of your program is to insert elements at the begining of a container: then you should use a std::deque and not a std::vector. std::vector is only good at inserting elements at the end.
Other containers have been introduced in C++11. I should start to find an updated graph with these new containers and insert it here.
The efficiency of obtaining the insertion point won't matter in the least - it will be dwarfed by the inefficiency of constantly shuffling the existing data up every time you do an insertion.
Use std::deque for this, that's what it was designed for.
An old thread, but it showed up at a coworker's desk as the first search result for a Google query.
There is one alternative to using a deque that is worth considering:
std::vector<T> foo;
for (int i = 0; i < 100000; ++i)
foo.push_back(T());
std::reverse( foo.begin(), foo.end() );
You still use a vector which is significantly more engineered than deque for performance. Also, swaps (which is what reverse uses) are quite efficient. On the other hand, the complexity, while still linear, is increased by 50%.
As always, measure before you decide what to do.
If you're looking for a computationally efficient way of inserting at the front, then you probably want to use a deque instead of a vector.
Most likely deque is the appropriate solution as suggested by others. But just for completeness, suppose that you need to do this front-insertion just once, that elsewhere in the program you don't need to do other operations on the front, and that otherwise vector provides the interface you need. If all of those are true, you could add the items with the very efficient push_back and then reverse the vector to get everything in order. That would have linear complexity rather than polynomial as it would when inserting at the front.
When you use a vector, you usually know the actual number of elements it is going to have. In this case, reserving the needed number of elements (100000 in the case you show) and filling them by using the [] operator is the fastest way. If you really need an efficient insert at the front, you can use deque or list, depending on your algorithms.
You may also consider inverting the logic of your algorithm and inserting at the end, that is usually faster for vectors.
I think you should change the type of your container if you really want to insert data at the beginning. It's the reason why vector does not have push_front() member function.
Intuitively, I agree with #Happy Green Kid Naps and ran a small test showing that for small sizes (1 << 10 elements of a primitive data type) it doesn't matter. For larger container sizes (1 << 20), however, std::deque seems to be of higher performance than reversing an std::vector. So, benchmark before you decide. Another factor might be the element type of the container.
Test 1: push_front (a) 1<<10 or (b) 1<<20 uint64_t into std::deque
Test 2: push_back (a) 1<<10 or (b) 1<<20 uint64_t into std::vector followed by std::reverse
Results:
Test 1 - deque (a) 19 µs
Test 2 - vector (a) 19 µs
Test 1 - deque (b) 6339 µs
Test 2 - vector (b) 10588 µs
You can support-
Insertion at front.
Insertion at the end.
Changing value at any position (won't present in deque)
Accessing value at any index (won't present in deque)
All above operations in O(1) time complexity
Note: You just need to know the upper bound on max_size it can go in left and right.
class Vector{
public:
int front,end;
int arr[100100]; // you should set this in according to 2*max_size
Vector(int initialize){
arr[100100/2] = initialize; // initializing value
front = end = 100100/2;
front--;end++;
}
void push_back(int val){
arr[end] = val;
end++;
}
void push_front(int val){
if(front<0){return;} // you should set initial size accordingly
arr[front] = val;
front--;
}
int value(int idx){
return arr[front+idx];
}
// similarity create function to change on any index
};
int main(){
Vector v(2);
for(int i=1;i<100;i++){
// O(1)
v.push_front(i);
}
for(int i=0;i<20;i++){
// to access the value in O(1)
cout<<v.value(i)<<" ";
}
return;
}
This may draw the ire of some because it does not directly answer the question, but it may help to keep in mind that retrieving the items from a std::vector in reverse order is both easy and fast.

Fastest way to remove duplicates from a vector<>

As the title says, I have in my mind some methods to do it but I don't know which is fastest.
So let's say that we have a: vector<int> vals with some values
1
After my vals are added
sort(vals.begin(), vals.end());
auto last = unique(vals.begin(), vals.end());
vals.erase(last, vals.end());
2
Convert to set after my vals are added:
set<int> s( vals.begin(), vals.end() );
vals.assign( s.begin(), s.end() );
3
When i add my vals, i check if it's already in my vector:
if( find(vals.begin(), vals.end(), myVal)!=vals.end() )
// add my val
4
Use a set from start
Ok, I've got these 4 methods, my questions are:
1 From 1, 2 and 3 which is the fastest?
2 Is 4 faster than the first 3?
3 At 2 after converting the vector to set, it's more convenabile to use the set to do what I need to do or should I do the vals.assign( .. ) and continue with my vector?
Question 1: Both 1 and 2 are O(n log n), 3 is O(n^2). Between 1 and 2, it depends on the data.
Question 2: 4 is also O(n log n) and can be better than 1 and 2 if you have lots of duplicates, because it only stores one copy of each. Imagine a million values that are all equal.
Question 3: Well, that really depends on what you need to do.
The only thing that can be said without knowing more is that your alternative number 3 is asymptotically worse than the others.
If you're using C++11 and don't need ordering, you can use std::unordered_set, which is a hash table and can be significantly faster than std::set.
Option 1 is going to beat all the others. The complexity is just O(N log N) and the contiguous memory of vector keeps the constant factors low.
std::set typically suffers a lot from non-contiguous allocations. It's not just slow to access those, just creating them takes significant time as well.
These methods all have their shortcomings although (1) is worth looking at.
But, take a look at this 5th option: Bear in mind that you can access the vector's data buffer using the data() function. Then, bearing in mind that no reallocation will take place since the vector will only ever get smaller, apply the algorithm that you learn at school:
unduplicate(vals.data(), vals.size());
void unduplicate(int* arr, std::size_t length) /*Reference: Gang of Four, I think*/
{
int *it, *end = arr + length - 1;
for (it = arr + 1; arr < end; arr++, it = arr + 1){
while (it <= end){
if (*it == *arr){
*it = *end--;
} else {
++it;
}
}
}
}
And resize the vector at the end if that is what's required. This is never worse than O(N^2), so is superior to insertion-sort or sort then remove approaches.
Your 4th option might be an idea if you can adopt it. Profile the performance. Otherwise use my algorithm from the 1960s.
I've got a similar problem recently, and experimented with 1, 2, and 4, as well as with unordered_set version of 4. In turned out that the best performance was the latter one, 4 with unordered_set in place of set.
BTW, that empirical finding is not too surprising if one considers that both set and sort were a bit of overkill: they guaranteed relative order of unequal elements. For example inputs 4,3,5,2,4,3 would lead to sorted output of unique values 2,3,4,5. This is unnecessary if you can live with unique values in arbitrary order, i.e. 3,4,2,5. When you use unordered_set it doesn't guarantee the order, only uniqueness, and therefore it doesn't have to perform the additional work of ensuring the order of different elements.

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.
Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.
You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.
You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64
The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

Inserting into a vector at the front

iterator insert ( iterator position, const T& x );
Is the function declaration of the insert operator of the std::Vector class.
This function's return type is an iterator pointing to the inserted element. My question is, given this return type, what is the most efficient way (this is part of a larger program I am running where speed is of the essence, so I am looking for the most computationally efficient way) of inserting at the beginning. Is it the following?
//Code 1
vector<int> intvector;
vector<int>::iterator it;
it = myvector.begin();
for(int i = 1; i <= 100000; i++){
it = intvector.insert(it,i);
}
Or,
//Code 2
vector<int> intvector;
for(int i = 1; i <= 100000; i++){
intvector.insert(intvector.begin(),i);
}
Essentially, in Code 2, is the parameter,
intvector.begin()
"Costly" to evaluate computationally as compared to using the returned iterator in Code 1 or should both be equally cheap/costly?
If one of the critical needs of your program is to insert elements at the begining of a container: then you should use a std::deque and not a std::vector. std::vector is only good at inserting elements at the end.
Other containers have been introduced in C++11. I should start to find an updated graph with these new containers and insert it here.
The efficiency of obtaining the insertion point won't matter in the least - it will be dwarfed by the inefficiency of constantly shuffling the existing data up every time you do an insertion.
Use std::deque for this, that's what it was designed for.
An old thread, but it showed up at a coworker's desk as the first search result for a Google query.
There is one alternative to using a deque that is worth considering:
std::vector<T> foo;
for (int i = 0; i < 100000; ++i)
foo.push_back(T());
std::reverse( foo.begin(), foo.end() );
You still use a vector which is significantly more engineered than deque for performance. Also, swaps (which is what reverse uses) are quite efficient. On the other hand, the complexity, while still linear, is increased by 50%.
As always, measure before you decide what to do.
If you're looking for a computationally efficient way of inserting at the front, then you probably want to use a deque instead of a vector.
Most likely deque is the appropriate solution as suggested by others. But just for completeness, suppose that you need to do this front-insertion just once, that elsewhere in the program you don't need to do other operations on the front, and that otherwise vector provides the interface you need. If all of those are true, you could add the items with the very efficient push_back and then reverse the vector to get everything in order. That would have linear complexity rather than polynomial as it would when inserting at the front.
When you use a vector, you usually know the actual number of elements it is going to have. In this case, reserving the needed number of elements (100000 in the case you show) and filling them by using the [] operator is the fastest way. If you really need an efficient insert at the front, you can use deque or list, depending on your algorithms.
You may also consider inverting the logic of your algorithm and inserting at the end, that is usually faster for vectors.
I think you should change the type of your container if you really want to insert data at the beginning. It's the reason why vector does not have push_front() member function.
Intuitively, I agree with #Happy Green Kid Naps and ran a small test showing that for small sizes (1 << 10 elements of a primitive data type) it doesn't matter. For larger container sizes (1 << 20), however, std::deque seems to be of higher performance than reversing an std::vector. So, benchmark before you decide. Another factor might be the element type of the container.
Test 1: push_front (a) 1<<10 or (b) 1<<20 uint64_t into std::deque
Test 2: push_back (a) 1<<10 or (b) 1<<20 uint64_t into std::vector followed by std::reverse
Results:
Test 1 - deque (a) 19 µs
Test 2 - vector (a) 19 µs
Test 1 - deque (b) 6339 µs
Test 2 - vector (b) 10588 µs
You can support-
Insertion at front.
Insertion at the end.
Changing value at any position (won't present in deque)
Accessing value at any index (won't present in deque)
All above operations in O(1) time complexity
Note: You just need to know the upper bound on max_size it can go in left and right.
class Vector{
public:
int front,end;
int arr[100100]; // you should set this in according to 2*max_size
Vector(int initialize){
arr[100100/2] = initialize; // initializing value
front = end = 100100/2;
front--;end++;
}
void push_back(int val){
arr[end] = val;
end++;
}
void push_front(int val){
if(front<0){return;} // you should set initial size accordingly
arr[front] = val;
front--;
}
int value(int idx){
return arr[front+idx];
}
// similarity create function to change on any index
};
int main(){
Vector v(2);
for(int i=1;i<100;i++){
// O(1)
v.push_front(i);
}
for(int i=0;i<20;i++){
// to access the value in O(1)
cout<<v.value(i)<<" ";
}
return;
}
This may draw the ire of some because it does not directly answer the question, but it may help to keep in mind that retrieving the items from a std::vector in reverse order is both easy and fast.

Problem with invalidation of STL iterators when calling erase

The STL standard defines that when an erase occurs on containers such as std::deque, std::list etc iterators are invalidated.
My question is as follows, assuming the list of integers contained in a std::deque, and a pair of indicies indicating a range of elements in the std::deque, what is the correct way to delete all even elements?
So far I have the following, however the problem here is that the assumed end is invalidated after an erase:
#include <cstddef>
#include <deque>
int main()
{
std::deque<int> deq;
for (int i = 0; i < 100; deq.push_back(i++));
// range, 11th to 51st element
std::pair<std::size_t,std::size_t> r(10,50);
std::deque<int>::iterator it = deq.begin() + r.first;
std::deque<int>::iterator end = deq.begin() + r.second;
while (it != end)
{
if (*it % 2 == 0)
{
it = deq.erase(it);
}
else
++it;
}
return 0;
}
Examining how std::remove_if is implemented, it seems there is a very costly copy/shift down process going on.
Is there a more efficient way of achieving the above without all the copy/shifts
In general is deleting/erasing an element more expensive than swapping it with the next nth value in the sequence (where n is the number of elements deleted/removed so far)
Note: Answers should assume the sequence size is quite large, +1mil elements and that on average 1/3 of elements would be up for erasure.
I'd use the Erase-Remove Idiom. I think the Wikipedia article linked even shows what you're doing -- removing odd elements.
The copying that remove_if does is no more costly than what happens when you delete elements from the middle of the container. It might even be more efficient.
Calling .erase() also results in "a very costly copy/shift down process going on.". When you erase an element from the middle of the container, every other element after that point must be shifted down one spot into the available space. If you erase multiple elements, you incur that cost for every erased element. Some of the non-erased elements will move several spots, but are forced to move one spot at a time instead of all at once. That is very inefficient.
The standard library algorithms std::remove and std::remove_if optimize this work. They use a clever trick to ensure that every moved element is only moved once. This is much, much faster than what you are doing yourself, contrary to your intuition.
The pseudocode is like this:
read_location <- beginning of range.
write_location <- beginning of range.
while read_location != end of range:
if the element at read_location should be kept in the container:
copy the element at the read_location to the write_location.
increment the write_location.
increment the read_location.
As you can see, every element in the original sequence is considered exactly once, and if it needs to be kept, it gets copied exactly once, to the current write_location. It will never be looked at again, because the write_location can never run in front of the read_location.
Remember that deque is a contiguous memory container (like vector, and probably sharing implementation), so removing elements mid-container necessarily means copying subsequent elements over the hole. You just want to make sure you're doing one iteration and copying each not-to-be-deleted object directly to its final position, rather than moving all objects one by one during each delete. remove_if is efficient and appropriate in this regard, your erase loop is not: it does massive amounts of unnecessary copying.
FWIW - alternatives:
add a "deleted" state to your objects and mark them deleted in place, but then every time you operate on the container you'll need to check yourself
use a list, which is implemented using pointers to previous and next elements, such that removing a list element alters the adjacent points to bypass that element: no copying, efficient iteration, just no random access, more small (i.e. inefficient) heap allocations and pointer overheads
What to choose depends on the nature, relative frequency, and performance requirements of specific operations (e.g. it may be that you can afford slow removals if they're done at non-critical times, but need fastest-possible iteration - whatever it is, make sure you understand your needs and the implications of the various data structures).
One alternative that hasn't been mentioned is to create a new deque, copy the elements that you want to keep into it, and swap it with the old deque.
void filter(std::deque<int>& in, std::pair<std::size_t,std::size_t> range) {
std::deque<int> out;
std::deque<int>::const_iterator first = in.begin();
std::deque<int>::const_iterator curr = first + range.first;
std::deque<int>::const_iterator last = first + range.second;
out.reserve(in.size() - (range.second-range.first));
std::copy(first, curr, std::back_inserter(out));
while (curr != last) {
if (*curr & 1) {
out.push_back(*curr);
}
++curr;
}
std::copy(last, in.end(), std::back_inserter(out));
in.swap(out);
}
I'm not sure if you have enough memory to create a copy, but it usually is faster and easier to make a copy instead of trying to inline erase elements from a large collection. If you still see memory thrashing, then figure out how many elements you are going to keep by calling std::count_if and reserve that many. This way you would have a single memory allocation.