What's wrong with my vector<T>::erase here? - c++

I have two vector<T> in my program, called active and non_active respectively. This refers to the objects it contains, as to whether they are in use or not.
I have some code that loops the active vector and checks for any objects that might have gone non active. I add these to a temp_list inside the loop.
Then after the loop, I take my temp_list and do non_active.insert of all elements in the temp_list.
After that, I do call erase on my active vector and pass it the temp_list to erase.
For some reason, however, the erase crashes.
This is the code:
non_active.insert(non_active.begin(), temp_list.begin(), temp_list.end());
active.erase(temp_list.begin(), temp_list.end());
I get this assertion:
Expression:("_Pvector == NULL || (((_Myvec*)_Pvector)->_Myfirst <= _Ptr && _Ptr <= ((_Myvect*)_Pvector)->_Mylast)",0)
I've looked online and seen that there is a erase-remove idiom, however not sure how I'd apply that to a removing a range of elements from a vector<T>
I'm not using C++11.

erase expects a range of iterators passed to it that lie within the current vector. You cannot pass iterators obtained from a different vector to erase.
Here is a possible, but inefficient, C++11 solution supported by lambdas:
active.erase(std::remove_if(active.begin(), active.end(), [](const T& x)
{
return std::find(temp_list.begin(), temp_list.end(), x) != temp_list.end();
}), active.end());
And here is the equivalent C++03 solution without the lambda:
template<typename Container>
class element_of
{
Container& container;
element_of(Container& container) : container(container) {}
public:
template<typename T>
bool operator()(const T& x) const
{
return std::find(container.begin(), container.end(), x)
!= container.end();
}
};
// ...
active.erase(std::remove_if(active.begin(), active.end(),
element_of<std::vector<T> >(temp_list)),
active.end());
If you replace temp_list with a std::set and the std::find_if with a find member function call on the set, the performance should be acceptable.

The erase method is intended to accept iterators to the same container object. You're trying to pass in iterators to temp_list to use to erase elements from active which is not allowed for good reasons, as a Sequence's range erase method is intended to specify a range in that Sequence to remove. It's important that the iterators are in that sequence because otherwise we're specifying a range of values to erase rather than a range within the same container which is a much more costly operation.
The type of logic you're trying to perform suggests to me that a set or list might be better suited for the purpose. That is, you're trying to erase various elements from the middle of a container that match a certain condition and transfer them to another container, and you could eliminate the need for temp_list this way.
With list, for example, it could be as easy as this:
for (ActiveList::iterator it = active.begin(); it != active.end();)
{
if (it->no_longer_active())
{
inactive.push_back(*it);
it = active.erase(it);
}
else
++it;
}
However, sometimes vector can outperform these solutions, and maybe you have need for vector for other reasons (like ensuring contiguous memory). In that case, std::remove_if is your best bet.
Example:
bool not_active(const YourObjectType& obj);
active_list.erase(
remove_if(active_list.begin(), active_list.end(), not_active),
active_list.end());
More info on this can be found under the topic, 'erase-remove idiom' and you may need predicate function objects depending on what external states are required to determine if an object is no longer active.

You can actually make the erase/remove idiom usable for your case. You just need to move the value over to the other container before std::remove_if possibly shuffles it around: in the predicate.
template<class OutIt, class Pred>
struct copy_if_predicate{
copy_if_predicate(OutIt dest, Pred p)
: dest(dest), pred(p) {}
template<class T>
bool operator()(T const& v){
if(pred(v)){
*dest++ = v;
return true;
}
return false;
}
OutIt dest;
Pred pred;
};
template<class OutIt, class Pred>
copy_if_predicate<OutIt,Pred> copy_if_pred(OutIt dest, Pred pred){
return copy_if_predicate<OutIt,Pred>(dest,pred);
}
Live example on Ideone. (I directly used bools to make the code shorter, not bothering with output and the likes.)

The function std::vector::erase requires the iterators to be iterators into this vector, but you are passing iterators from temp_list. You cannot erase elements from a container that are in a completely different container.

active.erase(temp_list.begin(), temp_list.end());
You try to erase elements from one list, but you use iterators for second list. First list iterators aren't the same, like in second list.

I would like to suggest that this is an example of where std::list should be used. You can splice members from one list to another. Look at std::list::splice()for this.
Do you need random access? If not then you don't need a std::vector.
Note that with list, when you splice, your iterators, and references to the objects in the list remain valid.
If you don't mind making the implementation "intrusive", your objects can contain their own iterator value, so they know where they are. Then when they change state, they can automate their own "moving" from one list to the other, and you don't need to transverse the whole list for them. (If you want this sweep to happen later, you can get them to "register" themselves for later moving).
I will write an algorithm here now to run through one collection and if a condition exists, it will effect a std::remove_if but at the same time will copy the element into your "inserter".
//fwd iterator must be writable
template< typename FwdIterator, typename InputIterator, typename Pred >
FwdIterator copy_and_remove_if( FwdIterator inp, FwdIterator end, InputIterator outp, Pred pred )
{
for( FwdIterator test = inp; test != end; ++test )
{
if( pred(*test) ) // insert
{
*outp = *test;
++outp;
}
else // keep
{
if( test != inp )
{
*inp = *test;
}
++inp;
}
}
return inp;
}
This is a bit like std::remove_if but will copy the ones being removed into an alternative collection. You would invoke it like this (for a vector) where isInactive is a valid predicate that indicates it should be moved.
active.erase( copy_and_remove_if( active.begin(), active.end(), std::back_inserter(inactive), isInactive ), active.end() );

The iterators you pass to erase() should point into the vector itself; the assertion is telling you that they don't. This version of erase() is for erasing a range out of the vector.
You need to iterate over temp_list yourself and call active.erase() on the result of dereferencing the iterator at each step.

Related

Iterating over std::set<unique_ptr>, how to keep track which ones to remove?

I need to loop over some objects of class T.
They are stored in an std::set<std::unique_ptr<T>> tees.
The main purpose of the loop's body is to use the objects, but by doing that I will also find out when some of the objects are no longer needed and can be deleted.
I am using a range-based for loop for iterating over the unique_ptrs:
for (std::unique_ptr<T> & tee : tees)
I know I cannot call tees.erase(tee) inside the loop (UB). Therefore I should collect the unique_ptrs that need to be deleted in a helper collection. Problem: The pointers are unique, therefore I cannot copy them into the helper collection.
I could collect the raw pointers in a std::set<T*>, but how would I use these after the loop to delete the matching unique_ptrs from the tees collection? Also, collecting the raw pointers again somehow feels wrong when I made the effort to use smart pointers in this problem.
I could switch to shared_ptr, but the pointers would only ever be shared for the purpose of deleting the objects. Does not feel right.
I could switch from range-based for to something else, like handling the iterators myself, and get the next iterator before deleting the entry. But going back to pre-C++11 techniques also does not feel right.
I could switch to std::remove_if. (EDIT: I can't, actually. Explained in comments below this question and below the accepted answer.) The loop's body would move into the unary_predicate lambda. But the main purpose of the loop is not to determine whether the objects should be deleted, but to make use of them, altering them.
The way of least resistance seems to be to go back to iterator-handling, that way I do not even need a helper collection. But I wonder if you can help me with a C++11-ish (or 14,17) solution?
I don't think you are going to find anything easier than
for(auto it = container.begin(), it != container.end();)
{
//use *it here
if(needs_to_be_erased)
it = container.erase(it);
else
++it;
}
since std::set does not provide mutable access to its elements any sort of transform or remove will not work. You would have to build a container of iterators and then after you process the set go through that container of iterators calling erase for each one.
I think you can copy the positions into a new data structure and remove these items in another loop by accessing the new data structure in reverse order.
int counter =0;
vector<int> indices;
for (unique_ptr<T> & tee : tees)
{
if (bCondition)
indices.push_back(counter);
counter++;
}
reverse(indices.begin(), indices.end());
for (int i : indices)
tees.erase(tees.begin() + i);
Not exactly a solution but if you have to do this a lot you could make your own algorithm for it. I guess the reason this is not in the standard library is because the algorithm needs to know the container to perform an erase.
So you could do something like this:
template<typename Cont, typename Pred>
void erase_if(Cont& c, decltype(std::begin(c)) b, decltype(std::end(c)) e, Pred p)
{
while(b != e)
{
if(p(*b))
b = c.erase(b);
else
++b;
}
}
template<typename Cont, typename Pred>
void erase_if(Cont& c, Pred p)
{ erase_if(c, std::begin(c), std::end(c), p); }
Then call it something like:
erase_if(tees, [](std::unique_ptr<int> const& up){
// use up here...
return (*up) & 1; // erase if odd number
});
or
erase_if(tees, std::begin(tees), std::end(tees), [](std::unique_ptr<int> const& up){
// use up here...
return (*up) & 1; // erase if odd number
});

Extract subvector in constant time

I have a std::vector<int> and I want to throw away the x first and y last elements. Just copying the elements is not an option, since this is O(n).
Is there something like vector.begin()+=x to let the vector just start later and end earlier?
I also tried
items = std::vector<int> (&items[x+1],&items[0]+items.size()-y);
where items is my vector, but this gave me bad_alloc
C++ standard algorithms work on ranges, not on actual containers, so you don't need to extract anything: you just need to adjust the iterator range you're working with.
void foo(const std::vector<T>& vec, const size_t start, const size_t end)
{
assert(vec.size() >= end-start);
auto it1 = vec.begin() + start;
auto it2 = vec.begin() + end;
std::whatever(it1, it2);
}
I don't see why it needs to be any more complicated than that.
(trivial live demo)
If you only need a range of values, you can represent that as a pair of iterators from first to last element of the range. These can be acquired in constant time.
Edit: According to the description in the comments, this seems like the most sensible solution. If your functions expect a vector reference, then you'll need to refactor a bit.
Other solutions:
If you don't need the original vector, and therefore can modify it, and the order of elements is not relevant, you can swap the first x elements with the n-x-y...n-y elements and then remove the last x+y elements. This can be done in O(x+y) time.
If appropriate, you could choose to use std::list for which what you're asking can be done in constant time if you have iterators to the first and last node of the sublist. This also requires that you can modify the original list but the order of elements won't change.
If those are not options, then you need to copy and are stuck with O(n).
The other answers are correct: usually iterators will do.
Nevertheless, you can also write a vector view. Here is a sketch:
template<typename T>
struct vector_view
{
vector_view(std::vector<T> const& v, size_t ind_begin, size_t ind_end)
: _v(v)
, _size(/* size of range */)
, _ind_begin(ind_begin) {}
auto size() const { return _size; }
auto const& operator[](size_t i) const
{
//possibly check for input outside range
return _v[ i + _ind_begin ];
}
//conversion of view to std::vector
operator std::vector<T>() const
{
std::vector<T> ret(_size);
//fill it
return ret;
}
private:
std::vector<T> const& _v;
size_t _size;
size_t _ind_begin;
}
Expose further methods as required (some iterator stuff might be appropriate when you want to use that with the standard library algorithms).
Further, take care on the validity of the const reference std::vector<T> const& v; -- if that could be an issue, one should better work with shared-pointers.
One can also think of more general approaches here, for example, use strides or similar things.

Why boost filter_iterator has weird make_filter_iterator function?

after some pain I managed to hack together this minimal example of boost filter_iterator
using namespace std;
std::function<bool(uint32_t)> stlfunc= [](uint32_t n){return n%3==0;};
int main()
{
vector<uint32_t> numbers{11,22,33,44,55,66,77,3,6,9};
auto start = boost::make_filter_iterator(stlfunc, numbers.begin(), numbers.end());
auto end = boost::make_filter_iterator(stlfunc, numbers.end() , numbers.end());
auto elem = std::max_element(start,end);
cout << *elem;
}
It works nice, but I wonder why the make_filter_iterator takes numbers.end()?
I might be wrong to use it that way, I guestimated it from the C array example:
http://www.boost.org/doc/libs/1_53_0/libs/iterator/example/filter_iterator_example.cpp
That is explained in the docs:
When skipping over elements, it is necessary for the filter adaptor to
know when to stop so as to avoid going past the end of the underlying
range. A filter iterator is therefore constructed with pair of
iterators indicating the range of elements in the unfiltered sequence
to be traversed.
From the source below you can see the always check if they have reached the end in satisfy_predicate:
void increment()
{
++(this->base_reference());
satisfy_predicate();
}
void satisfy_predicate()
{
while (this->base() != this->m_end && !this->m_predicate(*this->base()))
++(this->base_reference());
}
Also, as pointed out by Alex Chamberlain, the
constructors make it optional when passing the end iterator, for example: filter_iterator(Iterator x, Iterator end = Iterator()); (provided it is default constructible). So, you could omit numbers.end() from your code when constructing the end iterator.
If you look at the declaration of your make_filter_iterator template, you see it looks like this:
template <class Predicate, class Iterator>
filter_iterator<Predicate,Iterator>
make_filter_iterator(Predicate f, Iterator x, Iterator end = Iterator());
Specifically you see that the last parameter is a default parameter, and it's set to Iterator() which would mean it's default contructed and for some types of iterator it behaves like actual end() iterator, which points one past the end of any array, ie it points to garbage.
Most container types do require an actual end() iterator to be passed.

std::insert_iterator and iterator invalidation

I tried writing a generic, in place, intersperse function. The function should intersperse a given element into a sequence of elements.
#include <vector>
#include <list>
#include <algorithm>
#include <iostream>
template<typename ForwardIterator, typename InserterFunc>
void intersperse(ForwardIterator begin, ForwardIterator end, InserterFunc ins,
// we cannot use rvalue references here,
// maybe taking by value and letting users feed in std::ref would be smarter
const ForwardIterator::value_type& elem) {
if(begin == end) return;
while(++begin != end) {
// bugfix would be something like:
// begin = (ins(begin) = elem); // insert_iterator is convertible to a normal iterator
// or
// begin = (ins(begin) = elem).iterator(); // get the iterator to the last inserted element
// begin now points to the inserted element and we need to
// increment the iterator once again, which is safe
// ++begin;
ins(begin) = elem;
}
}
int main()
{
typedef std::list<int> container;
// as expected tumbles, falls over and goes up in flames with:
// typedef std::vector<int> container;
typedef container::iterator iterator;
container v{1,2,3,4};
intersperse(v.begin(), v.end(),
[&v](iterator it) { return std::inserter(v, it); },
23);
for(auto x : v)
std::cout << x << std::endl;
return 0;
}
The example works only for containers that do not invalidate their
iterators on insertion. Should I simply get rid of the iterators and
accept a container as the argument or am I missing something about
insert_iterator that makes this kind of usage possible?
The example works only for containers that do not invalidate their iterators on insertion.
Exactly.
Should I simply get rid of the iterators and accept a container as the argument
That would be one possibility. Another would be not making the algorithm in-place (ie. output to a different container/output-iterator).
am I missing something about insert_iterator that makes this kind of usage possible?
No. insert_iterator is meant for repeated inserts to a single place of a container eg. by a transform algorithm.
The problems with your implementation have absolutely nothing to do with the properties of insert_iterator. All kinds of insert iterators in C++ standard library are guaranteed to remain valid, even if you perform insertion into a container that potentially invalidates iterators on insert. This is, of course, true only if all insertions are performed through only through the insert iterator.
In other words, the implementation of insert iterators guarantees that the iterator will automatically "heal" itself, even if the insertion lead to a potentially iterator-invalidating event in the container.
The problem with your code is that begin and end iterators can potentially get invalidated by insertion into certain container types. It is begin and end that you need to worry about in your code, not the insert iterator.
Meanwhile, you do it completely backwards for some reason. You seem to care about refreshing the insert iterator (which is completely unnecessary), while completely ignoring begin and end.

Determining if an unordered vector<T> has all unique elements

Profiling my cpu-bound code has suggested I that spend a long time checking to see if a container contains completely unique elements. Assuming that I have some large container of unsorted elements (with < and = defined), I have two ideas on how this might be done:
The first using a set:
template <class T>
bool is_unique(vector<T> X) {
set<T> Y(X.begin(), X.end());
return X.size() == Y.size();
}
The second looping over the elements:
template <class T>
bool is_unique2(vector<T> X) {
typename vector<T>::iterator i,j;
for(i=X.begin();i!=X.end();++i) {
for(j=i+1;j!=X.end();++j) {
if(*i == *j) return 0;
}
}
return 1;
}
I've tested them the best I can, and from what I can gather from reading the documentation about STL, the answer is (as usual), it depends. I think that in the first case, if all the elements are unique it is very quick, but if there is a large degeneracy the operation seems to take O(N^2) time. For the nested iterator approach the opposite seems to be true, it is lighting fast if X[0]==X[1] but takes (understandably) O(N^2) time if all the elements are unique.
Is there a better way to do this, perhaps a STL algorithm built for this very purpose? If not, are there any suggestions eek out a bit more efficiency?
Your first example should be O(N log N) as set takes log N time for each insertion. I don't think a faster O is possible.
The second example is obviously O(N^2). The coefficient and memory usage are low, so it might be faster (or even the fastest) in some cases.
It depends what T is, but for generic performance, I'd recommend sorting a vector of pointers to the objects.
template< class T >
bool dereference_less( T const *l, T const *r )
{ return *l < *r; }
template <class T>
bool is_unique(vector<T> const &x) {
vector< T const * > vp;
vp.reserve( x.size() );
for ( size_t i = 0; i < x.size(); ++ i ) vp.push_back( &x[i] );
sort( vp.begin(), vp.end(), ptr_fun( &dereference_less<T> ) ); // O(N log N)
return adjacent_find( vp.begin(), vp.end(),
not2( ptr_fun( &dereference_less<T> ) ) ) // "opposite functor"
== vp.end(); // if no adjacent pair (vp_n,vp_n+1) has *vp_n < *vp_n+1
}
or in STL style,
template <class I>
bool is_unique(I first, I last) {
typedef typename iterator_traits<I>::value_type T;
…
And if you can reorder the original vector, of course,
template <class T>
bool is_unique(vector<T> &x) {
sort( x.begin(), x.end() ); // O(N log N)
return adjacent_find( x.begin(), x.end() ) == x.end();
}
You must sort the vector if you want to quickly determine if it has only unique elements. Otherwise the best you can do is O(n^2) runtime or O(n log n) runtime with O(n) space. I think it's best to write a function that assumes the input is sorted.
template<class Fwd>
bool is_unique(In first, In last)
{
return adjacent_find(first, last) == last;
}
then have the client sort the vector, or a make a sorted copy of the vector. This will open a door for dynamic programming. That is, if the client sorted the vector in the past then they have the option to keep and refer to that sorted vector so they can repeat this operation for O(n) runtime.
The standard library has std::unique, but that would require you to make a copy of the entire container (note that in both of your examples you make a copy of the entire vector as well, since you unnecessarily pass the vector by value).
template <typename T>
bool is_unique(std::vector<T> vec)
{
std::sort(vec.begin(), vec.end());
return std::unique(vec.begin(), vec.end()) == vec.end();
}
Whether this would be faster than using a std::set would, as you know, depend :-).
Is it infeasible to just use a container that provides this "guarantee" from the get-go? Would it be useful to flag a duplicate at the time of insertion rather than at some point in the future? When I've wanted to do something like this, that's the direction I've gone; just using the set as the "primary" container, and maybe building a parallel vector if I needed to maintain the original order, but of course that makes some assumptions about memory and CPU availability...
For one thing you could combine the advantages of both: stop building the set, if you have already discovered a duplicate:
template <class T>
bool is_unique(const std::vector<T>& vec)
{
std::set<T> test;
for (typename std::vector<T>::const_iterator it = vec.begin(); it != vec.end(); ++it) {
if (!test.insert(*it).second) {
return false;
}
}
return true;
}
BTW, Potatoswatter makes a good point that in the generic case you might want to avoid copying T, in which case you might use a std::set<const T*, dereference_less> instead.
You could of course potentially do much better if it wasn't generic. E.g if you had a vector of integers of known range, you could just mark in an array (or even bitset) if an element exists.
You can use std::unique, but it requires the range to be sorted first:
template <class T>
bool is_unique(vector<T> X) {
std::sort(X.begin(), X.end());
return std::unique(X.begin(), X.end()) == X.end();
}
std::unique modifies the sequence and returns an iterator to the end of the unique set, so if that's still the end of the vector then it must be unique.
This runs in nlog(n); the same as your set example. I don't think you can theoretically guarantee to do it faster, although using a C++0x std::unordered_set instead of std::set would do it in expected linear time - but that requires that your elements be hashable as well as having operator == defined, which might not be so easy.
Also, if you're not modifying the vector in your examples, you'd improve performance by passing it by const reference, so you don't make an unnecessary copy of it.
If I may add my own 2 cents.
First of all, as #Potatoswatter remarked, unless your elements are cheap to copy (built-in/small PODs) you'll want to use pointers to the original elements rather than copying them.
Second, there are 2 strategies available.
Simply ensure there is no duplicate inserted in the first place. This means, of course, controlling the insertion, which is generally achieved by creating a dedicated class (with the vector as attribute).
Whenever the property is needed, check for duplicates
I must admit I would lean toward the first. Encapsulation, clear separation of responsibilities and all that.
Anyway, there are a number of ways depending on the requirements. The first question is:
do we have to let the elements in the vector in a particular order or can we "mess" with them ?
If we can mess with them, I would suggest keeping the vector sorted: Loki::AssocVector should get you started.
If not, then we need to keep an index on the structure to ensure this property... wait a minute: Boost.MultiIndex to the rescue ?
Thirdly: as you remarked yourself a simple linear search doubled yield a O(N2) complexity in average which is no good.
If < is already defined, then sorting is obvious, with its O(N log N) complexity.
It might also be worth it to make T Hashable, because a std::tr1::hash_set could yield a better time (I know, you need a RandomAccessIterator, but if T is Hashable then it's easy to have T* Hashable to ;) )
But in the end the real issue here is that our advises are necessary generic because we lack data.
What is T, do you intend the algorithm to be generic ?
What is the number of elements ? 10, 100, 10.000, 1.000.000 ? Because asymptotic complexity is kind of moot when dealing with a few hundreds....
And of course: can you ensure unicity at insertion time ? Can you modify the vector itself ?
Well, your first one should only take N log(N), so it's clearly the better worse case scenario for this application.
However, you should be able to get a better best case if you check as you add things to the set:
template <class T>
bool is_unique3(vector<T> X) {
set<T> Y;
typename vector<T>::const_iterator i;
for(i=X.begin(); i!=X.end(); ++i) {
if (Y.find(*i) != Y.end()) {
return false;
}
Y.insert(*i);
}
return true;
}
This should have O(1) best case, O(N log(N)) worst case, and average case depends on the distribution of the inputs.
If the type T You store in Your vector is large and copying it is costly, consider creating a vector of pointers or iterators to Your vector elements. Sort it based on the element pointed to and then check for uniqueness.
You can also use the std::set for that. The template looks like this
template <class Key,class Traits=less<Key>,class Allocator=allocator<Key> > class set
I think You can provide appropriate Traits parameter and insert raw pointers for speed or implement a simple wrapper class for pointers with < operator.
Don't use the constructor for inserting into the set. Use insert method. The method (one of overloads) has a signature
pair <iterator, bool> insert(const value_type& _Val);
By checking the result (second member) You can often detect the duplicate much quicker, than if You inserted all elements.
In the (very) special case of sorting discrete values with a known, not too big, maximum value N.
You should be able to start a bucket sort and simply check that the number of values in each bucket is below 2.
bool is_unique(const vector<int>& X, int N)
{
vector<int> buckets(N,0);
typename vector<int>::const_iterator i;
for(i = X.begin(); i != X.end(); ++i)
if(++buckets[*i] > 1)
return false;
return true;
}
The complexity of this would be O(n).
Using the current C++ standard containers, you have a good solution in your first example. But if you can use a hash container, you might be able to do better, as the hash set will be nO(1) instead of nO(log n) for a standard set. Of course everything will depend on the size of n and your particular library implementation.