c++11: Erase multiple occurrences from vector. Which is best practice?

c++11: Erase multiple occurrences from vector. Which is best practice? - c++

I understand that erase moves the iterator forward automatically, so when removing multiple occurrences I need to avoid this so I can compare contiguous elements. That's why I usually do:
auto i = vect.begin();
while (i!=vect.end())
if (*i==someValue)
vect.erase(i);
else
++i;
But I was wondering if I could also do it with a for loop, like this:
for (auto i=vec.begin(); i!=vec.end(); ++i)
if (*i==someValue){
vec.erase(i);
--i;
}
The --i part looks a bit weird but it works. Would that be bad practice? Bad code? Be prone to errors? Or it's just right to use either option?
Thanks.

Use Remove and Erase idiom:
auto new_end = std::remove(v.begin(), v.end(), some_value);
v.erase(new_end, v.end());
That above code has complexity of O(n) and it can be executed in parallel if there're no data race from C++17 with
template< class ExecutionPolicy, class ForwardIt, class T >
ForwardIt remove( ExecutionPolicy&& policy, ForwardIt first, ForwardIt last, const T& value );
or with parallelism TS
Your code has a problem because from vector.modifiers#3
Effects: Invalidates iterators and references at or after the point of the erase
The standard said that iterator is invalidated
However, in reality, most of implementations keep the iterator point to the old node, which is now the end if it was the last element or the next element, then your code has complexity of O(n2) because it will loop n times and take n more for shift the data. It also can't be executed in parallel.

Related

Improvement to remove-erase idiom when order need not be preserved

The popular remove_if-erase idiom preserves the order of the elements kept in the container. I have a case in which I would like to remove some elements, but I don't care about the order of the remaining ones, because they are going to be moved around later on, anyway. I thought, then, that instead of using remove_if-erase, I can scan the vector and - when an element to remove is found - I can swap it with the last valid element of the vector. I call this idiom swap-erase, and it can easily be implemented as follows:
template<typename Object, typename Condition>
void swap_erase(std::vector<Object>& v, const Condition& condition) {
// Keeps track to one past the last element we want to keep.
auto iter_to_last = v.end();
for(auto it = v.begin(); it < iter_to_last; ++it) {
// If the erasure condition is fulfilled...
if(condition(*it)) {
// Increase by one to the left the "tail" of the
// vector, made by elements we want to get rid of;
// Swap the two elements.
// Rewind the current iterator by 1, so at the
// next iteration we test the element we just swapped.
std::iter_swap(it--, --iter_to_last);
}
}
// Erase the elements we pushed at the end of the queue.
v.erase(iter_to_last, v.end());
}
Since there is no need to shift elements, I would expect this code to be consistently faster than remove_if-erase on vectors which are large, or contain large objects.
However, a quick benchmark shows that the two are roughly equivalent when compiled by gcc 7.3.0 with -Ofast, on my i7 at 2.6GHz.
Am I wrong in my assumptions, in my implementation, or in the way I am benchmarking?
Edit: it turns out I was wrong in my assumption. This is a possible implementation of remove_if which clearly shows that it needs not shift any element:
template<typename ForwardIterator, typename Predicate>
ForwardIterator remove_if(ForwardIterator first, ForwardIterator last, Predicate pred) {
first = std::find_if(first, last, pred);
if(first == last) {
return first;
}
ForwardIterator result = first;
++first;
for(; first != last; ++first) {
if(!pred(first)) {
*result = std::move(*first);
++result;
}
}
return result;
}

Your assumption about how remove_if works might be wrong. Perhaps you should state it explicitly.
Basically remove_if moves each not deleted element at most once, so it is particularly fast if most elements are being deleted. (It might optimise by first scanning over the initial part of the array which is not being deleted, in which case it will also be fast if few elements are being deleted and the first deleted element is near the end.)
Your swap algorithm does one swap for each element being deleted, so it is fastest if few elements are being deleted. But swap is unnecessary, and unnecessarily slow in some cases since it requires three moves. You could just move the last element over top of the element being deleted, possibly saving two data copies.

Inplace versions of set_difference, set_intersection and set_union

I implemented versions of set_union, set_intersection and set_difference that take a sorted container and a sorted range (that must not be within the container), and write the result of the operation into the container.
template<class Container, class Iter>
void assign_difference(Container& cont, Iter first, Iter last)
{
auto new_end = std::set_difference( // (1)
cont.begin(), cont.end(), first, last, cont.begin());
cont.erase(new_end, cont.end());
}
template<class Container, class Iter>
void assign_intersection(Container& cont, Iter first, Iter last)
{
auto new_end = std::set_intersection( // (2)
cont.begin(), cont.end(), first, last, cont.begin());
cont.erase(new_end, cont.end());
}
template<class Container, class Iter>
void assign_union(Container& cont, Iter first, Iter last)
{
auto insert_count = last - first;
cont.resize(cont.size() + insert_count); // T must be default-constructible
auto rfirst1 = cont.rbegin() + insert_count, rlast1 = cont.rend();
auto rfirst2 = std::make_reverse_iterator(last);
auto rlast2 = std::make_reverse_iterator(first);
rlast1 = std::set_union( // (3)
rfirst1, rlast1, rfirst2, rlast2, cont.rbegin(), std::greater<>());
cont.erase(std::copy(rlast1.base(), cont.end(), cont.begin()), cont.end());
}
The goal was:
No allocation is performed if the container has enaugh capacity to hold the result.
Otherwise exactly one allocation is performed to give the container the capacity to hold the result.
As you can see in the lines marked (1), (2) and (3), the same container is used as input and output for those STL algorithms. Assuming a usual implementation of those STL algorithms, this code works, since it only writes to parts of the container that have already been processed.
As pointed out in the comments, it's not guaranteed by the standard that this works. set_union, set_intersection and set_difference require that the resulting range doesn't overlap with one of the input ranges.
However, can there be a STL implementation that breaks the code?
If your answer is yes, please provide a conforming implementations of one of the three used STL algorithms that breaks the code.

A conforming implementation could check if argument 1 and 5 of set_intersection are equal, and if they are format your harddrive.
If you violate the requirements, the behaviour of your program is not constrained by the standard; your program is ill formed.
There are situations where UB may be worth the risk and cost (auditing all compiler changes and assembly output). I do not see the point here; write your own. Any fancy optimizations that the std library comes up with could cause problems when you violate requirements as you are doing, and as you have noted the naive implementation is simple.

As rule of thumb I use do not write on a container on which you are iterating. Everything can happen. In general it's odd.
As #Yakk said, it sounds ill. That's it. Something to be removed from your code base an sleep peacefully.
If you really need those functions, I would suggest to write by yourself the inner loop (eg: the inner of std::set_intersection) in order to handle the constraint you need for your algorithm to work.
I don't think that seeking for an STL implementation on which it doesn't work is the right approach. It doesn't sound like a long term solution. For the long term: the standard should be your reference, and as someone already pointed out, your solution doesn't seems to properly deal with it.
My 2 cents

Accessing elements of a list of lists in C++

I have a list of lists like this:
std::list<std::list<double> > list;
I filled it with some lists with doubles in them (actually quite a lot, which is why I am not using a vector. All this copying takes up a lot of time.)
Say I want to access the element that could be accesed like list[3][3] if the list were not a list but a vector or two dimensional array. How would I do that?
I know that accessing elements in a list is accomplished by using an iterator. I couldn't figure out how to get out the double though.

double item = *std::next(std::begin(*std::next(std::begin(list), 3)), 3);
Using a vector would usually have much better performance, though; accessing element n of a list is O(n).
If you're concerned about performance of splicing the interior of the container, you could use deque, which has operator[], amortized constant insertion and deletion from either end, and linear time insertion and deletion from the interior.
For C++03 compilers, you can implement begin and next yourself:
template<typename Container>
typename Container::iterator begin(Container &container)
{
return container.begin();
}
template<typename Container>
typename Container::const_iterator begin(const Container &container)
{
return container.begin();
}
template<typename T, int n>
T *begin(T (&array)[n])
{
return &array[0];
}
template<typename Iterator>
Iterator next(Iterator it, typename std::iterator_traits<Iterator>::difference_type n = 1)
{
std::advance(it, n);
return it;
}

To actually answer your question, you should probably look at std::advance.

To strictly answer your question, Joachim Pileborg's answer is the way to go:
std::list<std::list<double> >::iterator it = list.begin();
std::advance(it, 3);
std::list<double>::iterator it2 = (*it).begin();
std::advance(it2, 3);
double d = *it2;
Now, from your question and further comments it is not clear whether you always add elements to the end of the lists or they can be added anywhere. If you always add to the end, vector<double> will work better. A vector<T> does not need to be copied every time its size increases; only whenever its capacity increases, which is a very different thing.
In addition to this, using reserve(), as others said before, will help a lot with the reallocations. You don't need to reserve for the combined size of all vectors, but only for each individual vector. So:
std::vector<std::vector<double> > v;
v.reserve(512); // If you are inserting 400 vectors, with a little extra just in case
And you would also reserve for each vector<double> inside v. That's all.
Take into account that your list of lists will take much more space. For each double in the internal list, it will have to allocate at least two additional pointers, and also two additional pointers for each list inside the global least. This means that the total memory taken by your container will be roughly three times that of the vector. And all this allocation and management also takes extra runtime.

Obtaining `std::priority_queue` elements in reverse order?

I've written some K-nearest-neighbor query methods which build a list of points that are nearest to a given query point. To maintain that list of neighbors, I use the std::priority_queue such that the top element is the farthest neighbor to the query point. This way I know if I should push the new element that is currently being examined (if at a lesser distance than the current farthest neighbor) and can pop() the farthest element when my priority-queue has more than K elements.
So far, all is well. However, when I output the elements, I would like to order them from the closest to the farthest. Currently, I simply pop all the elements from the priority-queue and put them on the output-container (through an iterator), which results in a sequence of points ordered from farthest to closest, so then, I call std::reverse on the output iterator range.
As a simple example, here is a linear-search that uses the priority-queue (obviously, the actual nearest-neighbor query methods I use are far more complicated):
template <typename DistanceValue,
typename ForwardIterator,
typename OutputIterator,
typename GetDistanceFunction,
typename CompareFunction>
inline
OutputIterator min_dist_linear_search(ForwardIterator first,
ForwardIterator last,
OutputIterator output_first,
GetDistanceFunction distance,
CompareFunction compare,
std::size_t max_neighbors = 1,
DistanceValue radius = std::numeric_limits<DistanceValue>::infinity()) {
if(first == last)
return output_first;
typedef std::priority_queue< std::pair<DistanceValue, ForwardIterator>,
std::vector< std::pair<DistanceValue, ForwardIterator> >,
detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction> > PriorityQueue;
PriorityQueue output_queue = PriorityQueue(detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction>(compare));
for(; first != last; ++first) {
DistanceValue d = distance(*first);
if(!compare(d, radius))
continue;
output_queue.push(std::pair<DistanceValue, ForwardIterator>(d, first));
while(output_queue.size() > max_neighbors)
output_queue.pop();
if(output_queue.size() == max_neighbors)
radius = output_queue.top().first;
};
OutputIterator it = output_first;
while( !output_queue.empty() ) {
*it = *(output_queue.top().second);
output_queue.pop(); ++it;
};
std::reverse(output_first, it);
return it;
};
The above is all dandy except for one thing: it requires the output-iterator type to be bidirectional and essentially be pointing to a pre-allocated container. Now, this practice of storing the output in a range prescribed by some output iterator is great and pretty standard too (e.g. std::copy and other STL algorithms are good examples of that). However, in this case I would like to be able to only require a forward output-iterator type, which would make it possible to use back-inserter iterators like those provided for STL containers and iostreams.
So, this boils down to reversing the priority-queue before dumping its content in the output iterator. So, these are the better options I've been able to come up with:
Create a std::vector, dump the priority-queue content in it, and dump the elements into the output-iterator using a reverse-iterator on the vector.
Replace the std::priority_queue with a sorted container (e.g. std::multimap), and then dump the content into the output-iterator using the appropriate traversal order.
Are there any other reasonable option?
I used to employ a std::multimap in a previous implementation of this algorithm and others, as of my second option above. However, when I switched to std::priority_queue, the performance gain was significant. So, I'd rather not use the second option, as it really seems that using a priority-queue for maintaining the list of neighbors is much better than relying on a sorted array. Btw, I also tried a std::vector that I maintain sorted with std::inplace_merge, which was better than multimap, but didn't match up to the priority-queue.
As for the first option, which is my best option at this point, it just seems wasteful to me to have to do this double transfer of data (queue -> vector -> output). I'm just inclined to think that there must be a simpler way to do this... something that I'm missing..
The first option really isn't that bad in this application (considering the complexity of the algorithm that precedes it), but if there is a trick to avoid this double memory transfer, I'd like to know about it.

Problem solved!
I'm such an idiot... I knew I was missing something obvious. In this case, the std::sort_heap() function. The reference page even has an example that does exactly what I need (and since the std::priority_queue is just implemented in terms of a random-access container and the heap-functions (pop_heap, push_heap, make_heap) it makes no real difference to use these functions directly in-place of the std::priority_queue class). I don't know how I could have missed that.
Anyways, I hope this helps anyone who had the same problem.

One dirty idea, which would nevertheless be guaranteed to work, would be the following:
std::priority_queue<int, std::vector<int>, std::less<int> > queue;
queue.push(3);
queue.push(5);
queue.push(9);
queue.push(2);
// Prints in reverse order.
int* front = const_cast<int*>(&queue.top());
int* back = const_cast<int*>(front + queue.size());
std::sort(front, back);
while (front < back) {
printf("%i ", *front);
++front;
}
It may be noted that the in-place sorting will likely break the queue.

why don't you just specify the opposite comparison function in the declaration:
#include <iostream>
#include <queue>
#include <vector>
#include <functional>
int main() {
std::priority_queue<int, std::vector<int>, std::greater<int> > pq;
pq.push(1);
pq.push(10);
pq.push(15);
std::cout << pq.top() << std::endl;
}

What's wrong with my vector<T>::erase here?

I have two vector<T> in my program, called active and non_active respectively. This refers to the objects it contains, as to whether they are in use or not.
I have some code that loops the active vector and checks for any objects that might have gone non active. I add these to a temp_list inside the loop.
Then after the loop, I take my temp_list and do non_active.insert of all elements in the temp_list.
After that, I do call erase on my active vector and pass it the temp_list to erase.
For some reason, however, the erase crashes.
This is the code:
non_active.insert(non_active.begin(), temp_list.begin(), temp_list.end());
active.erase(temp_list.begin(), temp_list.end());
I get this assertion:
Expression:("_Pvector == NULL || (((_Myvec*)_Pvector)->_Myfirst <= _Ptr && _Ptr <= ((_Myvect*)_Pvector)->_Mylast)",0)
I've looked online and seen that there is a erase-remove idiom, however not sure how I'd apply that to a removing a range of elements from a vector<T>
I'm not using C++11.

erase expects a range of iterators passed to it that lie within the current vector. You cannot pass iterators obtained from a different vector to erase.
Here is a possible, but inefficient, C++11 solution supported by lambdas:
active.erase(std::remove_if(active.begin(), active.end(), [](const T& x)
{
return std::find(temp_list.begin(), temp_list.end(), x) != temp_list.end();
}), active.end());
And here is the equivalent C++03 solution without the lambda:
template<typename Container>
class element_of
{
Container& container;
element_of(Container& container) : container(container) {}
public:
template<typename T>
bool operator()(const T& x) const
{
return std::find(container.begin(), container.end(), x)
!= container.end();
}
};
// ...
active.erase(std::remove_if(active.begin(), active.end(),
element_of<std::vector<T> >(temp_list)),
active.end());
If you replace temp_list with a std::set and the std::find_if with a find member function call on the set, the performance should be acceptable.

The erase method is intended to accept iterators to the same container object. You're trying to pass in iterators to temp_list to use to erase elements from active which is not allowed for good reasons, as a Sequence's range erase method is intended to specify a range in that Sequence to remove. It's important that the iterators are in that sequence because otherwise we're specifying a range of values to erase rather than a range within the same container which is a much more costly operation.
The type of logic you're trying to perform suggests to me that a set or list might be better suited for the purpose. That is, you're trying to erase various elements from the middle of a container that match a certain condition and transfer them to another container, and you could eliminate the need for temp_list this way.
With list, for example, it could be as easy as this:
for (ActiveList::iterator it = active.begin(); it != active.end();)
{
if (it->no_longer_active())
{
inactive.push_back(*it);
it = active.erase(it);
}
else
++it;
}
However, sometimes vector can outperform these solutions, and maybe you have need for vector for other reasons (like ensuring contiguous memory). In that case, std::remove_if is your best bet.
Example:
bool not_active(const YourObjectType& obj);
active_list.erase(
remove_if(active_list.begin(), active_list.end(), not_active),
active_list.end());
More info on this can be found under the topic, 'erase-remove idiom' and you may need predicate function objects depending on what external states are required to determine if an object is no longer active.

You can actually make the erase/remove idiom usable for your case. You just need to move the value over to the other container before std::remove_if possibly shuffles it around: in the predicate.
template<class OutIt, class Pred>
struct copy_if_predicate{
copy_if_predicate(OutIt dest, Pred p)
: dest(dest), pred(p) {}
template<class T>
bool operator()(T const& v){
if(pred(v)){
*dest++ = v;
return true;
}
return false;
}
OutIt dest;
Pred pred;
};
template<class OutIt, class Pred>
copy_if_predicate<OutIt,Pred> copy_if_pred(OutIt dest, Pred pred){
return copy_if_predicate<OutIt,Pred>(dest,pred);
}
Live example on Ideone. (I directly used bools to make the code shorter, not bothering with output and the likes.)

The function std::vector::erase requires the iterators to be iterators into this vector, but you are passing iterators from temp_list. You cannot erase elements from a container that are in a completely different container.

active.erase(temp_list.begin(), temp_list.end());
You try to erase elements from one list, but you use iterators for second list. First list iterators aren't the same, like in second list.

I would like to suggest that this is an example of where std::list should be used. You can splice members from one list to another. Look at std::list::splice()for this.
Do you need random access? If not then you don't need a std::vector.
Note that with list, when you splice, your iterators, and references to the objects in the list remain valid.
If you don't mind making the implementation "intrusive", your objects can contain their own iterator value, so they know where they are. Then when they change state, they can automate their own "moving" from one list to the other, and you don't need to transverse the whole list for them. (If you want this sweep to happen later, you can get them to "register" themselves for later moving).
I will write an algorithm here now to run through one collection and if a condition exists, it will effect a std::remove_if but at the same time will copy the element into your "inserter".
//fwd iterator must be writable
template< typename FwdIterator, typename InputIterator, typename Pred >
FwdIterator copy_and_remove_if( FwdIterator inp, FwdIterator end, InputIterator outp, Pred pred )
{
for( FwdIterator test = inp; test != end; ++test )
{
if( pred(*test) ) // insert
{
*outp = *test;
++outp;
}
else // keep
{
if( test != inp )
{
*inp = *test;
}
++inp;
}
}
return inp;
}
This is a bit like std::remove_if but will copy the ones being removed into an alternative collection. You would invoke it like this (for a vector) where isInactive is a valid predicate that indicates it should be moved.
active.erase( copy_and_remove_if( active.begin(), active.end(), std::back_inserter(inactive), isInactive ), active.end() );

The iterators you pass to erase() should point into the vector itself; the assertion is telling you that they don't. This version of erase() is for erasing a range out of the vector.
You need to iterate over temp_list yourself and call active.erase() on the result of dereferencing the iterator at each step.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

c++11: Erase multiple occurrences from vector. Which is best practice? - c++

Related

Improvement to remove-erase idiom when order need not be preserved

Inplace versions of set_difference, set_intersection and set_union

Accessing elements of a list of lists in C++

Obtaining `std::priority_queue` elements in reverse order?

What's wrong with my vector<T>::erase here?

Categories

Resources