Cache-friendliness std::list vs std::vector - c++

With CPU caches becoming better and better std::vector usually outperforms std::list even when it comes to testing the strengths of a std::list. For this reason, even for situations where I need to delete/insert in the middle of the container I usually pick std::vector but I realized I had never tested this to make sure assumptions were correct. So I set up some test code:
#include <iostream>
#include <chrono>
#include <list>
#include <vector>
#include <random>
void TraversedDeletion()
{
std::random_device dv;
std::mt19937 mt{ dv() };
std::uniform_int_distribution<> dis(0, 100000000);
std::vector<int> vec;
for (int i = 0; i < 100000; ++i)
{
vec.emplace_back(dis(mt));
}
std::list<int> lis;
for (int i = 0; i < 100000; ++i)
{
lis.emplace_back(dis(mt));
}
{
std::cout << "Traversed deletion...\n";
std::cout << "Starting vector measurement...\n";
auto now = std::chrono::system_clock::now();
auto index = vec.size() / 2;
auto itr = vec.begin() + index;
for (int i = 0; i < 10000; ++i)
{
itr = vec.erase(itr);
}
std::cout << "Took " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now() - now).count() << " μs\n";
}
{
std::cout << "Starting list measurement...\n";
auto now = std::chrono::system_clock::now();
auto index = lis.size() / 2;
auto itr = lis.begin();
std::advance(itr, index);
for (int i = 0; i < 10000; ++i)
{
auto it = itr;
std::advance(itr, 1);
lis.erase(it);
}
std::cout << "Took " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now() - now).count() << " μs\n";
}
}
void RandomAccessDeletion()
{
std::random_device dv;
std::mt19937 mt{ dv() };
std::uniform_int_distribution<> dis(0, 100000000);
std::vector<int> vec;
for (int i = 0; i < 100000; ++i)
{
vec.emplace_back(dis(mt));
}
std::list<int> lis;
for (int i = 0; i < 100000; ++i)
{
lis.emplace_back(dis(mt));
}
std::cout << "Random access deletion...\n";
std::cout << "Starting vector measurement...\n";
std::uniform_int_distribution<> vect_dist(0, vec.size() - 10000);
auto now = std::chrono::system_clock::now();
for (int i = 0; i < 10000; ++i)
{
auto rand_index = vect_dist(mt);
auto itr = vec.begin();
std::advance(itr, rand_index);
vec.erase(itr);
}
std::cout << "Took " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now() - now).count() << " μs\n";
std::cout << "Starting list measurement...\n";
now = std::chrono::system_clock::now();
for (int i = 0; i < 10000; ++i)
{
auto rand_index = vect_dist(mt);
auto itr = lis.begin();
std::advance(itr, rand_index);
lis.erase(itr);
}
std::cout << "Took " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now() - now).count() << " μs\n";
}
int main()
{
RandomAccessDeletion();
TraversedDeletion();
std::cin.get();
}
All results are compiled with /02 (Maximize speed).
The first, RandomAccessDeletion(), generates a random index and erases this index 10.000 times. My assumptions were right and the vector is indeed a lot faster than the list:
Random access deletion...
Starting vector measurement...
Took 240299 μs
Starting list measurement...
Took 1368205 μs
The vector is about 5.6x faster than the list. We can most likely thank our cache overlords for this performance benefit, even though we need to shift the elements in the vector on every deletion it's impact is less than the lookup time of the list as we can see in the benchmark.
So then I added another test, seen in the TraversedDeletion(). It doesn't use randomized positions to delete but rather it picks an index in the middle of the container and uses that as base iterator, then traverse the container to erase 10.000 times.
My assumptions were that the list would outperform the vector only slightly or as fast as the vector.
The results for the same execution:
Traversed deletion...
Starting vector measurement....
Took 195477 μs
Starting list measurement...
Took 581 μs
Wow. The list is about 336x faster. This is really far off from my expectations. So having a few cache misses in the list doesn't seem to matter at all here as cutting the lookup time for the list weighs in way more.
So the list apparently still has a really strong position when it comes to performance for corner/unusual cases, or are my test cases flawed in some way?
Does this mean that the list nowadays is only a reasonable option for lots of insertions/deletions in the middle of a container when traversing or are there other cases?
Is there a way I could change the vector access & erasure in TraversedDeletion() to make it at least a bit more competition vs the list?
In response to #BoPersson's comment:
vec.erase(it, it+10000) would perform a lot better than doing 10000
separate deletes.
Changing:
for (int i = 0; i < 10000; ++i)
{
itr = vec.erase(itr);
}
To:
vec.erase(itr, itr + 10000);
Gave me:
Starting vector measurement...
Took 19 μs
This is a major improvement already.

In TraversedDeletion you are essentially doing a pop_front but instead of being at the front you are doing it in the middle. For a linked list this is not an issue. Deleting the node is a O(1) operation. Unfortunately when you do this in the vector is it a O(N) operation where N is vec.end() - itr. This is because it has to copy every element from deletion point forward one element. That is why it is so much more expensive in the vector case.
On the other hand in RandomAccessDeletion you are constantly changing the delete point. This means you have an O(N) operation to traverse the list to get to the node to delete and a O(1) to delete the node versus a O(1) traversersal to find the element and a O(N) operation to copy the elements in the vector forward. The reason this is not the same though is the cost to traverse from node to node has a higher constant than it takes to copy the elements in the vector.

The long duration for list in RandomDeletion is due to the time it takes to advance from the beginning of the list to the randomly selected element, an O(N) operation.
TraverseDeletion just increments an iterator, an O(1) operation.

The "fast" part about a vector is "reaching" the element which needs to be accessed (traversing). You don't actually traverse much on the vector in the deletion but only access the first element. ( I would say the adavance-by-one does not make much measurement wise)
The deletion then takes quite a lot of time ( O(n) so when deleting each one by itself it's O(n²) ) due to changing the elements in the memory. Because the deletion changes the memory on the locations after the deleted element you also cannot benefit from prefetching which also is a thing which makes the vector that fast.
I am not sure how much the deletion also would invalidate the caches because the memory beyond the iterator has changed but this can also have a very big impact on the performance.

In the first test, the list had to traverse to the point of deletion, then delete the entry. The time the list took was in traversing for each deletion.
In the second test, the list traversed once, then repeatedly deleted. The time taken was still in the traversal; the deletion was cheap. Except now we don't repeatedly traverse.
For the vector, traversal is free. Deletion takes time. Randomly deleting an element takes less time than it took for the list to traverse to that random element, so vector wins in the first case.
In the second case, the vector does the hard work many many more times than the list does it hard work.
But, the problem is that isn't how you should traverse-and-delete from a vector. It is an acceptable way to do it for a list.
The way you'd write this for a vector is std::remove_if, followed by erase. Or just one erase:
auto index = vec.size() / 2;
auto itr = vec.begin() + index;
vec.erase(itr, itr+10000);
Or, to emulate a more complex decision making process involving erasing elements:
auto index = vec.size() / 2;
auto itr = vec.begin() + index;
int count = 10000;
auto last = std::remove_if( itr, vec.end(),
[&count](auto&&){
if (count <= 0) return false;
--count;
return true;
}
);
vec.erase(last, vec.end());
Almost the only case where list is way faster than vector is when you store an iterator into the list, and you periodically erase at or near that iterator while still traversing the list between such erase actions.
Almost every other use case has a vector use-pattern that matches or exceeds list performance in my experience.
The code cannot always be translated line-for-line, as you have demonstrated.
Every time you erase an element in a vector, it moves the "tail" of the vector over 1.
If you erase 10,000 elements, it moves the "tail" of the vector over 10000 in one step.
If you remove_if, it removes the tail over efficiently, gives you the "wasted" remaining, and you can then remove the waste from the vector.

I want po point out something still not mentioned in this question:
In the std::vector, when you delete an element in the middle, the elements are moved thanks to new move semantics. That is one of the reasons the first test takes this speed, because you are not even copying the elements after the deleted iterator. You could reproduce the experiment with a vector and list of non-copiable type and see how the performance of the list (in comparation) is better.

I would suggest to run the same tests using a more complex data type in the std::vector: instead of an int, use a structure.
Even better use a static C array as a vector element, and then take measurements with different array sizes.
So, you could swap this line of your code:
std::vector<int> vec;
with something like:
const size_t size = 256;
struct TestType { int a[size]; };
std::vector<TestType> vec;
and test with different values of size. The behavior may depend on this parameter.

Related

Why removing random element from vector and list costs almost the same time?

As cppreference says
Lists are sequence containers that allow constant time insert and erase operations anywhere within the sequence, and iteration in both directions.
Considering the continuous memory used by std::vector where erase should be linear time. So it is reasonable that random erase operations on std::list should be more efficient than std::vector.
But I the program shows differently.
int randi(int min, int max) {
return rand()%(max-min)+min; // Generate the number, assign to variable.
}
int main() {
srand(time(NULL)); // Seed the time
int N = 100000;
int M = N-2;
int arr[N];
for (int i = 0; i < N; i++) {
arr[i] = i;
}
list<int> ls(arr, arr+N);
vector<int> vec(arr, arr+N);
std::chrono::time_point<std::chrono::system_clock> start, end;
start = std::chrono::system_clock::now();
for (int i = 0; i < M; i++) {
int j = randi(0, N - i);
ls.erase(next(ls.begin(), j));
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds_1 = end - start;
cout << "list time cost: " << elapsed_seconds_1.count()) << "\n";
for (int i = 0; i < M; i++) {
int j = randi(0, N - i);
vec.erase(vec.begin() + j);
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds_2 = end - start;
cout << "vector time cost: " << elapsed_seconds_2.count()) << "\n";
return 0;
}
~/cpp_learning/list$ ./demo
list time cost: 8.114993171
vector time cost: 8.306458676
Because it takes a long time to find the element in the list. Insertion or removal from list is O(1) if you already hold an iterator to the desired insertion/deletion location. In this case you don't, and the std::next(ls.begin(), j) call is doing O(n) work, eliminating all savings from the cheap O(1) erase (frankly, I'm a little surprised it didn't lose to vector; I'd expect O(n) pointer-chasing operations to cost more than a O(n) contiguous memmove-like operation, but what do I know?) Update: On checking, you forgot to save a new start point before the vector test, and in fact, once you fix that issue, the vector is much faster, so my intuition was correct there: Try it online!
With -std=c++17 -O3, output was:
list time cost: 9.63976
vector time cost: 0.191249
Similarly, the vector is cheap to get to the relevant index (O(1)), but (relatively) expensive to delete it (O(n) copy-down operation after).
When you won't be iterating it otherwise, list won't save you anything if you're performing random access insertions and deletions. Situations like that call for using std::unordered_map and related containers.

Why use find() in unordered_map is much faster than directly read?

Its hard to describe so I will just show the code:
#include <bits/stdc++.h>
using namespace std;
int main()
{
clock_t start, end;
unordered_map<int, int> m;
long test=0;
int size = 9999999;
for (int i=0; i<size/3; i++) {
m[i] = 1;
}
start = clock();
for (int i=0; i<size; i++) {
//if (m.find(i) != m.end())
test += m[i];
}
end = clock();
double time_taken = double(end - start) / double(CLOCKS_PER_SEC);
cout << "Time taken by program is : " << fixed
<< time_taken << setprecision(5);
cout << " sec " << endl;
return 0;
}
The result(3 times):
Without if (m.find(i) != m.end()):
Time taken by program is : 3.508257 sec
Time taken by program is : 3.554726 sec
Time taken by program is : 3.520102 sec
With if (m.find(i) != m.end()):
Time taken by program is : 1.734134 sec
Time taken by program is : 1.663341 sec
Time taken by program is : 1.736100 sec
Can anyone explain why? What really happened inside add m[i] when the key not appeared?
In this line
test += m[i];
the operator[] does two things: First it tries to find the entry for the given key, then if the entry does not exist it creates a new entry.
On the other hand here:
if (m.find(i) != m.end())
test += m[i];
the operator[] does only one thing: It finds the element with the given key (and because you checked before that it exists, no new entry has to be constructed).
As the map contains only keys up to size/3 your results suggest that creating the element outweights the overhead for first checking if the element does exist.
In the first case there are size elements in the map while in the second there are only size/3 elements in the map.
Note that calling operator[] can get more expensive the more elements are in the map. It is Average case: constant, worst case: linear in size. and the same holds for find. However, calling the methods many times, the worst case should amortize and you are left with average constant.
Thanks to Aconcagua, for pointing out that you did not reserve space in the map. In the first case you add many elements that require to allocate space, while in the second, the size of the map stays constant during the part you measure. Try to call reserve before the loop. Naively I would expect that the loops would be very similar in that case.
The difference with and without the if is down to you having only populated the first third of the map.
If you do a find, then the program will go and find the element, and if it exists then it will do the operator[], which finds it again (not terribly efficient), find it exists, and return the value
Without the if, when you do the operator[]. it will try and find the element, fail, and create the element (with the default value for an int, which is 0), and return it
So without the if, you are populating the whole map, which will increase the runtime.
If you wanted to be more efficient, you could use the result of the find to fetch the value
auto iter = m.find(i);
if (iter != m.end())
{
test += iter->second;
}

Why is the complexity of std::unordered_set operator==() N^2?

I have two vectors v1 and v2 of type std::vector<std::string>. Both vectors have unique values and should compare equal if values compare equal but independent of the order values appear in the vector.
I assume two sets of type std::unordered_set would have been a better choice, but I take it as it is, so two vectors.
Nevertheless, I thought for the needed order insensitive comparison I'll just use operator== from std::unordered_set by copying to two std::unordered_set. Very much like this:
bool oi_compare1(std::vector<std::string> const&v1,
std::vector<std::string> const&v2)
{
std::unordered_set<std::string> tmp1(v1.begin(),v1.end());
std::unordered_set<std::string> tmp2(v2.begin(),v2.end());
return tmp1 == tmp2;
}
While profiling I noticed this function consuming a lot of time, so I checked doc and saw the O(n*n) complexity here. I am confused, I was expecting O(n*log(n)), like e.g. for the following naive solution I came up with:
bool oi_compare2(std::vector<std::string> const&v1,
std::vector<std::string> const&v2)
{
if(v1.size() != v2.size())
return false;
auto tmp = v2;
size_t const size = tmp.size();
for(size_t i = 0; i < size; ++i)
{
bool flag = false;
for(size_t j = i; j < size; ++j)
if(v1[i] == tmp[j]){
flag = true;
std::swap(tmp[i],tmp[j]);
break;
}
if(!flag)
return false;
}
return true;
}
Why the O(n*n) complexity for std::unordered_set and is there a build in function I can use for order insensitive comparision?
EDIT----
BENCHMARK
#include <unordered_set>
#include <chrono>
#include <iostream>
#include <vector>
bool oi_compare1(std::vector<std::string> const&v1,
std::vector<std::string> const&v2)
{
std::unordered_set<std::string> tmp1(v1.begin(),v1.end());
std::unordered_set<std::string> tmp2(v2.begin(),v2.end());
return tmp1 == tmp2;
}
bool oi_compare2(std::vector<std::string> const&v1,
std::vector<std::string> const&v2)
{
if(v1.size() != v2.size())
return false;
auto tmp = v2;
size_t const size = tmp.size();
for(size_t i = 0; i < size; ++i)
{
bool flag = false;
for(size_t j = i; j < size; ++j)
if(v1[i] == tmp[j]){
flag = true;
std::swap(tmp[i],tmp[j]);
break;
}
if(!flag)
return false;
}
return true;
}
int main()
{
std::vector<std::string> s1{"1","2","3"};
std::vector<std::string> s2{"1","3","2"};
std::cout << std::boolalpha;
for(size_t i = 0; i < 15; ++i)
{
auto tmp1 = s1;
for(auto &iter : tmp1)
iter = std::to_string(i)+iter;
s1.insert(s1.end(),tmp1.begin(),tmp1.end());
s2.insert(s2.end(),tmp1.begin(),tmp1.end());
}
std::cout << "size1 " << s1.size() << std::endl;
std::cout << "size2 " << s2.size() << std::endl;
for(auto && c : {oi_compare1,oi_compare2})
{
auto start = std::chrono::steady_clock::now();
bool flag = true;
for(size_t i = 0; i < 10; ++i)
flag = flag && c(s1,s2);
std::cout << "ms=" << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - start).count() << " flag=" << flag << std::endl;
}
return 0;
}
gives
size1 98304
size2 98304
ms=844 flag=true
ms=31 flag=true
--> naive approach way faster.
For all the Complexity O(N*N) experts here...
Let me go through this naive approach. I have two loops there. The first loop is running from i=0 to size which is N. The inner loop is called from j=i!!!!!! to N. In language spoken it means I call the Inner loop N times. But the complexity of the inner loop is log(n) due to the starting index of j = i !!!!. If you still dont believe me calculate the complexity from benchmarks and you will see...
EDIT2---
LIVE ON WANDBOX
https://wandbox.org/permlink/v26oxnR2GVDb9M6y
Since unordered_set is build using hashmap, the logic to compare lhs==rhs will be:
Check size of lhs and rhs, if not equal, return false
For each item in lhs, find it in rhs, and compare
For hashmap, the single find time complexity for an item in rhs in worst case will be O(n). So the worst case time complexity will be O(n^2). However normally you get an time complexity of O(n).
I'm sorry to tell you, your benchmark of operator== is faulty.
oi_compare1 accepts 2 vectors and needs to build up 2 complete unordered_set instances, to than call operator== and destroy the complete bunch again.
oi_compare2 also accepts 2 vectors, and immediately uses them for the comparison on size. Only copies 1 instance (v2 to tmp), which is much more performant for a vector.
operator==
Looking at the documentation: https://en.cppreference.com/w/cpp/container/unordered_set/operator_cmp we can see the expected complexity:
Proportional to N calls to operator== on value_type, calls to the predicate returned by key_eq, and calls to the hasher returned by hash_function, in the average case, proportional to N2 in the worst case where N is the size of the container.
edit
There is a simple algorithm, you can loop over the unordered_set and do a simple lookup in the other one. Without hash collisions, it will find each element in it's own internal bucket and compare it for equality as the hashing ain't sufficient.
Assuming you don't have hash collisions, each element of the unordered_set has a stable order in which they are stored. One could loop over the internal buckets and compare the elements 2-by-2 (1st of the one with the 1st of the second, 2nd of the one with the 2nd of the second ...). This nicely gives O(N). This doesn't work when you have different sizes of the buckets you store the values in, or when the assignment of buckets uses a different calculation to deal with collisions.
Assuming you are unlucky and every element results into the same hash. (Known as hash flooding) You result in a list of elements without order. To compare, you have to check for each element if it exists in the other one, causing O(N*N).
This last one is easy reproducible if you rig your hash to always return the same number. Build the one set in the reverse order as the other one.

Experiment with find algorithm using sentinel

I was experimenting with some known algorithm which aims to reduce the number of comparisons in an operation of finding element in an unsorted array. The algorithm uses sentinel which is added to the back of the array, which allows to write a loop where we use only one comparison, instead of two. It's important to note that the overall Big O computational complexity is not changed, it is still O(n). However, when looking at the number of comparisons, the standard finding algorithm is so to say O(2n) while the sentinel algorithm is O(n).
Standard find algorithm from the c++ library works like this:
template<class InputIt, class T>
InputIt find(InputIt first, InputIt last, const T& value)
{
for (; first != last; ++first) {
if (*first == value) {
return first;
}
}
return last;
}
We can see two comparisons there and one increment.
In the algorithm with sentinel the loop looks like this:
while (a[i] != key)
++i;
There is only one comparison and one increment.
I made some experiments and measured time, but on every computer the results were different. Unfortunately I didn't have access to any serious machine, I only had my laptop with VirtualBox there with Ubuntu, under which I compiled and run the code. I had a problem with the amount of memory. I tried using online compilers like Wandbox and Ideone but the time limits and memory limits didn't allow me to make reliable experiments. But every time I run my code, changing the number of elements in my vector or changing the number of execution of my test, I saw different results. Sometimes the times were comparable, sometimes std::find was significantly faster, sometimes significantly faster was the sentinel algorithm.
I was surprised because the logic says that the sentinel version indeed should work faster and every time. Do you have any explanation for this? Do you have any experience with this kind of algorithm? Is it worht the effort to even try to use it in production code when performance is crucial and when the array cannot be sorted (and any other mechanism to solve this problem, like hashmap, indexing etc., cannot be used)?
Here's my code of testing this. It's not beautiful, in fact it is ugly, but the beauty wasn't my goal here. Maybe something is wrong with my code?
#include <iostream>
#include <algorithm>
#include <chrono>
#include <vector>
using namespace std::chrono;
using namespace std;
const unsigned long long N = 300000000U;
static void find_with_sentinel()
{
vector<char> a(N);
char key = 1;
a[N - 2] = key; // make sure the searched element is in the array at the last but one index
unsigned long long high = N - 1;
auto tmp = a[high];
// put a sentinel at the end of the array
a[high] = key;
unsigned long long i = 0;
while (a[i] != key)
++i;
// restore original value
a[high] = tmp;
if (i == high && key != tmp)
cout << "find with sentinel, not found" << endl;
else
cout << "find with sentinel, found" << endl;
}
static void find_with_std_find()
{
vector<char> a(N);
int key = 1;
a[N - 2] = key; // make sure the searched element is in the array at the last but one index
auto pos = find(begin(a), end(a), key);
if (pos != end(a))
cout << "find with std::find, found" << endl;
else
cout << "find with sentinel, not found" << endl;
}
int main()
{
const int times = 10;
high_resolution_clock::time_point t1 = high_resolution_clock::now();
for (auto i = 0; i < times; ++i)
find_with_std_find();
high_resolution_clock::time_point t2 = high_resolution_clock::now();
auto duration = duration_cast<milliseconds>(t2 - t1).count();
cout << "std::find time = " << duration << endl;
t1 = high_resolution_clock::now();
for (auto i = 0; i < times; ++i)
find_with_sentinel();
t2 = high_resolution_clock::now();
duration = duration_cast<milliseconds>(t2 - t1).count();
cout << "sentinel time = " << duration << endl;
}
Move the memory allocation (vector construction) outside the measured functions (e.g. pass the vector as argument).
Increase times to a few thousands.
You're doing a whole lot of time-consuming work in your functions. That work is hiding the differences in the timings. Consider your find_with_sentinel function:
static void find_with_sentinel()
{
// ***************************
vector<char> a(N);
char key = 1;
a[N - 2] = key; // make sure the searched element is in the array at the last but one index
// ***************************
unsigned long long high = N - 1;
auto tmp = a[high];
// put a sentinel at the end of the array
a[high] = key;
unsigned long long i = 0;
while (a[i] != key)
++i;
// restore original value
a[high] = tmp;
// ***************************************
if (i == high && key != tmp)
cout << "find with sentinel, not found" << endl;
else
cout << "find with sentinel, found" << endl;
// **************************************
}
The three lines at the top and the four lines at the bottom are identical in both functions, and they're fairly expensive to run. The top contains a memory allocation and the bottom contains an expensive output operation. These are going to mask the time it takes to do the real work of the function.
You need to move the allocation and the output out of the function. Change the function signature to:
static int find_with_sentinel(vector<char> a, char key);
In other words, make it the same as std::find. If you do that, then you don't have to wrap std::find, and you get a more realistic view of how your function will perform in a typical situation.
It's quite possible that the sentinel find function will be faster. However, it comes with some drawbacks. The first is that you can't use it with immutable lists. The second is that it's not safe to use in a multi-threaded program due to the potential of one thread overwriting the sentinel that the other thread is using. It also might not be "faster enough" to justify replacing std::find.

How to delete items from a std::vector given a list of indices

I have a vector of items items, and a vector of indices that should be deleted from items:
std::vector<T> items;
std::vector<size_t> indicesToDelete;
items.push_back(a);
items.push_back(b);
items.push_back(c);
items.push_back(d);
items.push_back(e);
indicesToDelete.push_back(3);
indicesToDelete.push_back(0);
indicesToDelete.push_back(1);
// given these 2 data structures, I want to remove items so it contains
// only c and e (deleting indices 3, 0, and 1)
// ???
What's the best way to perform the deletion, knowing that with each deletion, it affects all other indices in indicesToDelete?
A couple ideas would be to:
Copy items to a new vector one item at a time, skipping if the index is in indicesToDelete
Iterate items and for each deletion, decrement all items in indicesToDelete which have a greater index.
Sort indicesToDelete first, then iterate indicesToDelete, and for each deletion increment an indexCorrection which gets subtracted from subsequent indices.
All seem like I'm over-thinking such a seemingly trivial task. Any better ideas?
Edit Here is the solution, basically a variation of #1 but using iterators to define blocks to copy to the result.
template<typename T>
inline std::vector<T> erase_indices(const std::vector<T>& data, std::vector<size_t>& indicesToDelete/* can't assume copy elision, don't pass-by-value */)
{
if(indicesToDelete.empty())
return data;
std::vector<T> ret;
ret.reserve(data.size() - indicesToDelete.size());
std::sort(indicesToDelete.begin(), indicesToDelete.end());
// new we can assume there is at least 1 element to delete. copy blocks at a time.
std::vector<T>::const_iterator itBlockBegin = data.begin();
for(std::vector<size_t>::const_iterator it = indicesToDelete.begin(); it != indicesToDelete.end(); ++ it)
{
std::vector<T>::const_iterator itBlockEnd = data.begin() + *it;
if(itBlockBegin != itBlockEnd)
{
std::copy(itBlockBegin, itBlockEnd, std::back_inserter(ret));
}
itBlockBegin = itBlockEnd + 1;
}
// copy last block.
if(itBlockBegin != data.end())
{
std::copy(itBlockBegin, data.end(), std::back_inserter(ret));
}
return ret;
}
I would go for 1/3, that is: order the indices vector, create two iterators into the data vector, one for reading and one for writting. Initialize the writing iterator to the first element to be removed, and the reading iterator to one beyond that one. Then in each step of the loop increment the iterators to the next value (writing) and next value not to be skipped (reading) and copy/move the elements. At the end of the loop call erase to discard the elements beyond the last written to position.
BTW, this is the approach implemented in the remove/remove_if algorithms of the STL with the difference that you maintain the condition in a separate ordered vector.
std::sort() the indicesToDelete in descending order and then delete from the items in a normal for loop. No need to adjust indices then.
It might even be option 4:
If you are deleting a few items from a large number, and know that there will never be a high density of deleted items:
Replace each of the items at indices which should be deleted with 'tombstone' values, indicating that there is nothing valid at those indices, and make sure that whenever you access an item, you check for a tombstone.
It depends on the numbers you are deleting.
If you are deleting many items, it may make sense to copy the items that are not deleted to a new vector and then replace the old vector with the new vector (after sorting the indicesToDelete). That way, you will avoid compressing the vector after each delete, which is an O(n) operation, possibly making the entire process O(n^2).
If you are deleting a few items, perhaps do the deletion in reverse index order (assuming the indices are sorted), then you do not need to adjust them as items get deleted.
Since the discussion has somewhat transformed into a performance related question, I've written up the following code. It uses remove_if and vector::erase, which should move the elements a minimal number of times. There's a bit of overhead, but for large cases, this should be good.
However, if you don't care about the relative order of elements, then this will not be all that fast.
#include <algorithm>
#include <iostream>
#include <string>
#include <vector>
#include <set>
using std::vector;
using std::string;
using std::remove_if;
using std::cout;
using std::endl;
using std::set;
struct predicate {
public:
predicate(const vector<string>::iterator & begin, const vector<size_t> & indices) {
m_begin = begin;
m_indices.insert(indices.begin(), indices.end());
}
bool operator()(string & value) {
const int index = distance(&m_begin[0], &value);
set<size_t>::iterator target = m_indices.find(index);
return target != m_indices.end();
}
private:
vector<string>::iterator m_begin;
set<size_t> m_indices;
};
int main() {
vector<string> items;
items.push_back("zeroth");
items.push_back("first");
items.push_back("second");
items.push_back("third");
items.push_back("fourth");
items.push_back("fifth");
vector<size_t> indicesToDelete;
indicesToDelete.push_back(3);
indicesToDelete.push_back(0);
indicesToDelete.push_back(1);
vector<string>::iterator pos = remove_if(items.begin(), items.end(), predicate(items.begin(), indicesToDelete));
items.erase(pos, items.end());
for (int i=0; i< items.size(); ++i)
cout << items[i] << endl;
}
The output for this would be:
second
fourth
fifth
There is a bit of a performance overhead that can still be reduced. In remove_if (atleast on gcc), the predicate is copied by value for each element in the vector. This means that we're possibly doing the copy constructor on the set m_indices each time. If the compiler is not able to get rid of this, then I would recommend passing the indices in as a set, and storing it as a const reference.
We could do that as follows:
struct predicate {
public:
predicate(const vector<string>::iterator & begin, const set<size_t> & indices) : m_begin(begin), m_indices(indices) {
}
bool operator()(string & value) {
const int index = distance(&m_begin[0], &value);
set<size_t>::iterator target = m_indices.find(index);
return target != m_indices.end();
}
private:
const vector<string>::iterator & m_begin;
const set<size_t> & m_indices;
};
int main() {
vector<string> items;
items.push_back("zeroth");
items.push_back("first");
items.push_back("second");
items.push_back("third");
items.push_back("fourth");
items.push_back("fifth");
set<size_t> indicesToDelete;
indicesToDelete.insert(3);
indicesToDelete.insert(0);
indicesToDelete.insert(1);
vector<string>::iterator pos = remove_if(items.begin(), items.end(), predicate(items.begin(), indicesToDelete));
items.erase(pos, items.end());
for (int i=0; i< items.size(); ++i)
cout << items[i] << endl;
}
Basically the key to the problem is remembering that if you delete the object at index i, and don't use a tombstone placeholder, then the vector must make a copy of all of the objects after i. This applies to every possibility you suggested except for #1. Copying to a new list makes one copy no matter how many you delete, making it by far the fastest answer.
And as David Rodríguez said, sorting the list of indexes to be deleted allows for some minor optimizations, but it may only worth it if you're deleting more than 10-20 (please profile first).
Here is my solution for this problem which keeps the order of the original "items":
create a "vector mask" and initialize (fill) it with "false" values.
change the values of mask to "true" for all the indices you want to remove.
loop over all members of "mask" and erase from both vectors "items" and "mask" the elements with "true" values.
Here is the code sample:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<unsigned int> items(12);
vector<unsigned int> indicesToDelete(3);
indicesToDelete[0] = 3;
indicesToDelete[1] = 0;
indicesToDelete[2] = 1;
for(int i=0; i<12; i++) items[i] = i;
for(int i=0; i<items.size(); i++)
cout << "items[" << i << "] = " << items[i] << endl;
// removing indeces
vector<bool> mask(items.size());
vector<bool>::iterator mask_it;
vector<unsigned int>::iterator items_it;
for(size_t i = 0; i < mask.size(); i++)
mask[i] = false;
for(size_t i = 0; i < indicesToDelete.size(); i++)
mask[indicesToDelete[i]] = true;
mask_it = mask.begin();
items_it = items.begin();
while(mask_it != mask.end()){
if(*mask_it){
items_it = items.erase(items_it);
mask_it = mask.erase(mask_it);
}
else{
mask_it++;
items_it++;
}
}
for(int i=0; i<items.size(); i++)
cout << "items[" << i << "] = " << items[i] << endl;
return 0;
}
This is not a fast implementation for using with large data sets. The method "erase()" takes time to rearrange the vector after eliminating the element.