Determining if an unordered vector<T> has all unique elements

Determining if an unordered vector<T> has all unique elements - c++

Profiling my cpu-bound code has suggested I that spend a long time checking to see if a container contains completely unique elements. Assuming that I have some large container of unsorted elements (with < and = defined), I have two ideas on how this might be done:
The first using a set:
template <class T>
bool is_unique(vector<T> X) {
set<T> Y(X.begin(), X.end());
return X.size() == Y.size();
}
The second looping over the elements:
template <class T>
bool is_unique2(vector<T> X) {
typename vector<T>::iterator i,j;
for(i=X.begin();i!=X.end();++i) {
for(j=i+1;j!=X.end();++j) {
if(*i == *j) return 0;
}
}
return 1;
}
I've tested them the best I can, and from what I can gather from reading the documentation about STL, the answer is (as usual), it depends. I think that in the first case, if all the elements are unique it is very quick, but if there is a large degeneracy the operation seems to take O(N^2) time. For the nested iterator approach the opposite seems to be true, it is lighting fast if X[0]==X[1] but takes (understandably) O(N^2) time if all the elements are unique.
Is there a better way to do this, perhaps a STL algorithm built for this very purpose? If not, are there any suggestions eek out a bit more efficiency?

Your first example should be O(N log N) as set takes log N time for each insertion. I don't think a faster O is possible.
The second example is obviously O(N^2). The coefficient and memory usage are low, so it might be faster (or even the fastest) in some cases.
It depends what T is, but for generic performance, I'd recommend sorting a vector of pointers to the objects.
template< class T >
bool dereference_less( T const *l, T const *r )
{ return *l < *r; }
template <class T>
bool is_unique(vector<T> const &x) {
vector< T const * > vp;
vp.reserve( x.size() );
for ( size_t i = 0; i < x.size(); ++ i ) vp.push_back( &x[i] );
sort( vp.begin(), vp.end(), ptr_fun( &dereference_less<T> ) ); // O(N log N)
return adjacent_find( vp.begin(), vp.end(),
not2( ptr_fun( &dereference_less<T> ) ) ) // "opposite functor"
== vp.end(); // if no adjacent pair (vp_n,vp_n+1) has *vp_n < *vp_n+1
}
or in STL style,
template <class I>
bool is_unique(I first, I last) {
typedef typename iterator_traits<I>::value_type T;
…
And if you can reorder the original vector, of course,
template <class T>
bool is_unique(vector<T> &x) {
sort( x.begin(), x.end() ); // O(N log N)
return adjacent_find( x.begin(), x.end() ) == x.end();
}

You must sort the vector if you want to quickly determine if it has only unique elements. Otherwise the best you can do is O(n^2) runtime or O(n log n) runtime with O(n) space. I think it's best to write a function that assumes the input is sorted.
template<class Fwd>
bool is_unique(In first, In last)
{
return adjacent_find(first, last) == last;
}
then have the client sort the vector, or a make a sorted copy of the vector. This will open a door for dynamic programming. That is, if the client sorted the vector in the past then they have the option to keep and refer to that sorted vector so they can repeat this operation for O(n) runtime.

The standard library has std::unique, but that would require you to make a copy of the entire container (note that in both of your examples you make a copy of the entire vector as well, since you unnecessarily pass the vector by value).
template <typename T>
bool is_unique(std::vector<T> vec)
{
std::sort(vec.begin(), vec.end());
return std::unique(vec.begin(), vec.end()) == vec.end();
}
Whether this would be faster than using a std::set would, as you know, depend :-).

Is it infeasible to just use a container that provides this "guarantee" from the get-go? Would it be useful to flag a duplicate at the time of insertion rather than at some point in the future? When I've wanted to do something like this, that's the direction I've gone; just using the set as the "primary" container, and maybe building a parallel vector if I needed to maintain the original order, but of course that makes some assumptions about memory and CPU availability...

For one thing you could combine the advantages of both: stop building the set, if you have already discovered a duplicate:
template <class T>
bool is_unique(const std::vector<T>& vec)
{
std::set<T> test;
for (typename std::vector<T>::const_iterator it = vec.begin(); it != vec.end(); ++it) {
if (!test.insert(*it).second) {
return false;
}
}
return true;
}
BTW, Potatoswatter makes a good point that in the generic case you might want to avoid copying T, in which case you might use a std::set<const T*, dereference_less> instead.
You could of course potentially do much better if it wasn't generic. E.g if you had a vector of integers of known range, you could just mark in an array (or even bitset) if an element exists.

You can use std::unique, but it requires the range to be sorted first:
template <class T>
bool is_unique(vector<T> X) {
std::sort(X.begin(), X.end());
return std::unique(X.begin(), X.end()) == X.end();
}
std::unique modifies the sequence and returns an iterator to the end of the unique set, so if that's still the end of the vector then it must be unique.
This runs in nlog(n); the same as your set example. I don't think you can theoretically guarantee to do it faster, although using a C++0x std::unordered_set instead of std::set would do it in expected linear time - but that requires that your elements be hashable as well as having operator == defined, which might not be so easy.
Also, if you're not modifying the vector in your examples, you'd improve performance by passing it by const reference, so you don't make an unnecessary copy of it.

If I may add my own 2 cents.
First of all, as #Potatoswatter remarked, unless your elements are cheap to copy (built-in/small PODs) you'll want to use pointers to the original elements rather than copying them.
Second, there are 2 strategies available.
Simply ensure there is no duplicate inserted in the first place. This means, of course, controlling the insertion, which is generally achieved by creating a dedicated class (with the vector as attribute).
Whenever the property is needed, check for duplicates
I must admit I would lean toward the first. Encapsulation, clear separation of responsibilities and all that.
Anyway, there are a number of ways depending on the requirements. The first question is:
do we have to let the elements in the vector in a particular order or can we "mess" with them ?
If we can mess with them, I would suggest keeping the vector sorted: Loki::AssocVector should get you started.
If not, then we need to keep an index on the structure to ensure this property... wait a minute: Boost.MultiIndex to the rescue ?
Thirdly: as you remarked yourself a simple linear search doubled yield a O(N2) complexity in average which is no good.
If < is already defined, then sorting is obvious, with its O(N log N) complexity.
It might also be worth it to make T Hashable, because a std::tr1::hash_set could yield a better time (I know, you need a RandomAccessIterator, but if T is Hashable then it's easy to have T* Hashable to ;) )
But in the end the real issue here is that our advises are necessary generic because we lack data.
What is T, do you intend the algorithm to be generic ?
What is the number of elements ? 10, 100, 10.000, 1.000.000 ? Because asymptotic complexity is kind of moot when dealing with a few hundreds....
And of course: can you ensure unicity at insertion time ? Can you modify the vector itself ?

Well, your first one should only take N log(N), so it's clearly the better worse case scenario for this application.
However, you should be able to get a better best case if you check as you add things to the set:
template <class T>
bool is_unique3(vector<T> X) {
set<T> Y;
typename vector<T>::const_iterator i;
for(i=X.begin(); i!=X.end(); ++i) {
if (Y.find(*i) != Y.end()) {
return false;
}
Y.insert(*i);
}
return true;
}
This should have O(1) best case, O(N log(N)) worst case, and average case depends on the distribution of the inputs.

If the type T You store in Your vector is large and copying it is costly, consider creating a vector of pointers or iterators to Your vector elements. Sort it based on the element pointed to and then check for uniqueness.
You can also use the std::set for that. The template looks like this
template <class Key,class Traits=less<Key>,class Allocator=allocator<Key> > class set
I think You can provide appropriate Traits parameter and insert raw pointers for speed or implement a simple wrapper class for pointers with < operator.
Don't use the constructor for inserting into the set. Use insert method. The method (one of overloads) has a signature
pair <iterator, bool> insert(const value_type& _Val);
By checking the result (second member) You can often detect the duplicate much quicker, than if You inserted all elements.

In the (very) special case of sorting discrete values with a known, not too big, maximum value N.
You should be able to start a bucket sort and simply check that the number of values in each bucket is below 2.
bool is_unique(const vector<int>& X, int N)
{
vector<int> buckets(N,0);
typename vector<int>::const_iterator i;
for(i = X.begin(); i != X.end(); ++i)
if(++buckets[*i] > 1)
return false;
return true;
}
The complexity of this would be O(n).

Using the current C++ standard containers, you have a good solution in your first example. But if you can use a hash container, you might be able to do better, as the hash set will be nO(1) instead of nO(log n) for a standard set. Of course everything will depend on the size of n and your particular library implementation.

Related

Extract subvector in constant time

I have a std::vector<int> and I want to throw away the x first and y last elements. Just copying the elements is not an option, since this is O(n).
Is there something like vector.begin()+=x to let the vector just start later and end earlier?
I also tried
items = std::vector<int> (&items[x+1],&items[0]+items.size()-y);
where items is my vector, but this gave me bad_alloc

C++ standard algorithms work on ranges, not on actual containers, so you don't need to extract anything: you just need to adjust the iterator range you're working with.
void foo(const std::vector<T>& vec, const size_t start, const size_t end)
{
assert(vec.size() >= end-start);
auto it1 = vec.begin() + start;
auto it2 = vec.begin() + end;
std::whatever(it1, it2);
}
I don't see why it needs to be any more complicated than that.
(trivial live demo)

If you only need a range of values, you can represent that as a pair of iterators from first to last element of the range. These can be acquired in constant time.
Edit: According to the description in the comments, this seems like the most sensible solution. If your functions expect a vector reference, then you'll need to refactor a bit.
Other solutions:
If you don't need the original vector, and therefore can modify it, and the order of elements is not relevant, you can swap the first x elements with the n-x-y...n-y elements and then remove the last x+y elements. This can be done in O(x+y) time.
If appropriate, you could choose to use std::list for which what you're asking can be done in constant time if you have iterators to the first and last node of the sublist. This also requires that you can modify the original list but the order of elements won't change.
If those are not options, then you need to copy and are stuck with O(n).

The other answers are correct: usually iterators will do.
Nevertheless, you can also write a vector view. Here is a sketch:
template<typename T>
struct vector_view
{
vector_view(std::vector<T> const& v, size_t ind_begin, size_t ind_end)
: _v(v)
, _size(/* size of range */)
, _ind_begin(ind_begin) {}
auto size() const { return _size; }
auto const& operator[](size_t i) const
{
//possibly check for input outside range
return _v[ i + _ind_begin ];
}
//conversion of view to std::vector
operator std::vector<T>() const
{
std::vector<T> ret(_size);
//fill it
return ret;
}
private:
std::vector<T> const& _v;
size_t _size;
size_t _ind_begin;
}
Expose further methods as required (some iterator stuff might be appropriate when you want to use that with the standard library algorithms).
Further, take care on the validity of the const reference std::vector<T> const& v; -- if that could be an issue, one should better work with shared-pointers.
One can also think of more general approaches here, for example, use strides or similar things.

How to efficiently compare vectors with C++?

I need advice for micro optimization in C++ for a vector comparison function,
it compares two vectors for equality and order of elements does not matter.
template <class T>
static bool compareVectors(const vector<T> &a, const vector<T> &b)
{
int n = a.size();
std::vector<bool> free(n, true);
for (int i = 0; i < n; i++) {
bool matchFound = false;
for (int j = 0; j < n; j++) {
if (free[j] && a[i] == b[j]) {
matchFound = true;
free[j] = false;
break;
}
}
if (!matchFound) return false;
}
return true;
}
This function is used heavily and I am thinking of possible way to optimize it.
Can you please give me some suggestions? By the way I use C++11.
Thanks

It just realized that this code only does kind of a "set equivalency" check (and now I see that you actually did say that, what a lousy reader I am!). This can be achieved much simpler
template <class T>
static bool compareVectors(vector<T> a, vector<T> b)
{
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
return (a == b);
}
You'll need to include the header algorithm.
If your vectors are always of same size, you may want to add an assertion at the beginning of the method:
assert(a.size() == b.size());
This will be handy in debugging your program if you once perform this operation for unequal lengths by mistake.
Otherwise, the vectors can't be the same if they have unequal length, so just add
if ( a.size() != b.size() )
{
return false;
}
before the sort instructions. This will save you lots of time.
The complexity of this technically is O(n*log(n)) because it's mainly dependent on the sorting which (usually) is of that complexity. This is better than your O(n^2) approach, but might be worse due to the needed copies. This is irrelevant if your original vectors may be sorted.
If you want to stick with your approach, but tweak it, here are my thoughts on this:
You can use std::find for this:
template <class T>
static bool compareVectors(const vector<T> &a, const vector<T> &b)
{
const size_t n = a.size(); // make it const and unsigned!
std::vector<bool> free(n, true);
for ( size_t i = 0; i < n; ++i )
{
bool matchFound = false;
auto start = b.cbegin();
while ( true )
{
const auto position = std::find(start, b.cend(), a[i]);
if ( position == b.cend() )
{
break; // nothing found
}
const auto index = position - b.cbegin();
if ( free[index] )
{
// free pair found
free[index] = false;
matchFound = true;
break;
}
else
{
start = position + 1; // search in the rest
}
}
if ( !matchFound )
{
return false;
}
}
return true;
}
Another possibility is replacing the structure to store free positions. You may try a std::bitset or just store the used indices in a vector and check if a match isn't in that index-vector. If the outcome of this function is very often the same (so either mostly true or mostly false) you can optimize your data structures to reflect that. E.g. I'd use the list of used indices if the outcome is usually false since only a handful of indices might needed to be stored.
This method has the same complexity as your approach. Using std::find to search for things is sometimes better than a manual search. (E.g. if the data is sorted and the compiler knows about it, this can be a binary search).

Your can probabilistically compare two unsorted vectors (u,v) in O(n):
Calculate:
U= xor(h(u[0]), h(u[1]), ..., h(u[n-1]))
V= xor(h(v[0]), h(v[1]), ..., h(v[n-1]))
If U==V then the vectors are probably equal.
h(x) is any non-cryptographic hash function - such as MurmurHash. (Cryptographic functions would work as well but would usually be slower).
(This would work even without hashing, but it would be much less robust when the values have a relatively small range).
A 128-bit hash function would be good enough for many practical applications.

I am noticing that most proposed solution involved sorting booth of the input vectors.I think sorting the arrays compute more that what is strictly necessary for the evaluation the equality of the two vector ( and if the inputs vectors are constant, a copy needs to be made).
One other way would be to build an associative container to count the element in each vector... It's also possible to do the reduction of the two vector in parrallel.In the case of very large vector that could give a nice speed up.
template <typename T> bool compareVector(const std::vector<T> & vec1, const std::vector<T> & vec2) {
if (vec1.size() != vec2.size())
return false ;
//Here we assuame that T is hashable ...
auto count_set = std::unordered_map<T,int>();
//We count the element in each vector...
for (unsigned int count = 0 ; count < vec1.size();++count)
{
count_set[vec1[count]]++;
count_set[vec2[count]]--;
} ;
// If everything balance out we should have zero everywhere
return std::all_of(count_set.begin(),count_set.end(),[](const std::pair<T,int> p) { return p.second == 0 ;});
}
That way depend on the performance of your hashsing function , we might get linear complexity in the the length of booth vector (vs n*logn with the sorting).
NB the code might have some bug , did have time to check it ...
Benchmarking this way of comparing two vector to sort based comparison i get on ubuntu 13.10,vmware core i7 gen 3 :
Comparing 200 vectors of 500 elements by counting takes 0.184113 seconds
Comparing 200 vectors of 500 elements by sorting takes 0.276409 seconds
Comparing 200 vectors of 1000 elements by counting takes 0.359848 seconds
Comparing 200 vectors of 1000 elements by sorting takes 0.559436 seconds
Comparing 200 vectors of 5000 elements by counting takes 1.78584 seconds
Comparing 200 vectors of 5000 elements by sorting takes 2.97983 seconds

As others suggested, sorting your vectors beforehand will improve performance.
As an additional optimization you can make heaps out of the vectors to compare (with complexity O(n) instead of sorting with O(n*log(n)).
Afterwards you can pop elements from both heaps (complexity O(log(n))) until you get a mismatch.
This has the advantage that you only heapify instead of sort your vectors if they are not equal.
Below is a code sample. To know what is really fastest, you will have to measure with some sample data for your usecase.
#include <algorithm>
typedef std::vector<int> myvector;
bool compare(myvector& l, myvector& r)
{
bool possibly_equal=l.size()==r.size();
if(possibly_equal)
{
std::make_heap(l.begin(),l.end());
std::make_heap(r.begin(),r.end());
for(int i=l.size();i!=0;--i)
{
possibly_equal=l.front()==r.front();
if(!possibly_equal)
break;
std::pop_heap(l.begin(),l.begin()+i);
std::pop_heap(r.begin(),r.begin()+i);
}
}
return possibly_equal;
}

If you use this function a lot on the same vectors, it might be better to keep sorted copies for comparison.
In theory it might even be better to sort the vectors and compare sorted vectors if each one is compared just once, (sorting is O(n*log(n)), comparing sorted vector O(n), while your function is O(n^2).
But I suppose the time spent allocating memory for the sorted vectors will dwarf any theoretical gains if you don't compare the same vectors often.
As with all optimisations, profiling is the only way to make sure, I'd try some std::sort / std::equal combo.

Like stefan says you need to sort to get better complexity.
Then you can use
== operator (tnx for the correction in the comments - ste equal will also work but it is more appropriate for comparing ranges not entire containers)
If that is not fast enough only then bother with microoptimization.
Also are vectors guaranteed to be of the same size?
If not put that check at the begining.

Another possible solution (viable only if all elements are unique), which should improve somewhat the solution of #stefan (although the complexity would remain in O(NlogN)) is this:
template <class T>
static bool compareVectors(vector<T> a, const vector<T> & b)
{
// You should probably check this outside as it can
// avoid you the copy of a
if (a.size() != b.size()) return false;
std::sort(a.begin(), a.end());
for (const auto & v : b)
if ( !std::binary_search(a.begin(), a.end(), v) ) return false;
return true;
}
This should be faster since it performs the search directly as an O(NlogN) operation, instead of sorting b (O(NlogN)) and then searching both vectors (O(N)).

Accessing elements of a list of lists in C++

I have a list of lists like this:
std::list<std::list<double> > list;
I filled it with some lists with doubles in them (actually quite a lot, which is why I am not using a vector. All this copying takes up a lot of time.)
Say I want to access the element that could be accesed like list[3][3] if the list were not a list but a vector or two dimensional array. How would I do that?
I know that accessing elements in a list is accomplished by using an iterator. I couldn't figure out how to get out the double though.

double item = *std::next(std::begin(*std::next(std::begin(list), 3)), 3);
Using a vector would usually have much better performance, though; accessing element n of a list is O(n).
If you're concerned about performance of splicing the interior of the container, you could use deque, which has operator[], amortized constant insertion and deletion from either end, and linear time insertion and deletion from the interior.
For C++03 compilers, you can implement begin and next yourself:
template<typename Container>
typename Container::iterator begin(Container &container)
{
return container.begin();
}
template<typename Container>
typename Container::const_iterator begin(const Container &container)
{
return container.begin();
}
template<typename T, int n>
T *begin(T (&array)[n])
{
return &array[0];
}
template<typename Iterator>
Iterator next(Iterator it, typename std::iterator_traits<Iterator>::difference_type n = 1)
{
std::advance(it, n);
return it;
}

To actually answer your question, you should probably look at std::advance.

To strictly answer your question, Joachim Pileborg's answer is the way to go:
std::list<std::list<double> >::iterator it = list.begin();
std::advance(it, 3);
std::list<double>::iterator it2 = (*it).begin();
std::advance(it2, 3);
double d = *it2;
Now, from your question and further comments it is not clear whether you always add elements to the end of the lists or they can be added anywhere. If you always add to the end, vector<double> will work better. A vector<T> does not need to be copied every time its size increases; only whenever its capacity increases, which is a very different thing.
In addition to this, using reserve(), as others said before, will help a lot with the reallocations. You don't need to reserve for the combined size of all vectors, but only for each individual vector. So:
std::vector<std::vector<double> > v;
v.reserve(512); // If you are inserting 400 vectors, with a little extra just in case
And you would also reserve for each vector<double> inside v. That's all.
Take into account that your list of lists will take much more space. For each double in the internal list, it will have to allocate at least two additional pointers, and also two additional pointers for each list inside the global least. This means that the total memory taken by your container will be roughly three times that of the vector. And all this allocation and management also takes extra runtime.

Obtaining `std::priority_queue` elements in reverse order?

I've written some K-nearest-neighbor query methods which build a list of points that are nearest to a given query point. To maintain that list of neighbors, I use the std::priority_queue such that the top element is the farthest neighbor to the query point. This way I know if I should push the new element that is currently being examined (if at a lesser distance than the current farthest neighbor) and can pop() the farthest element when my priority-queue has more than K elements.
So far, all is well. However, when I output the elements, I would like to order them from the closest to the farthest. Currently, I simply pop all the elements from the priority-queue and put them on the output-container (through an iterator), which results in a sequence of points ordered from farthest to closest, so then, I call std::reverse on the output iterator range.
As a simple example, here is a linear-search that uses the priority-queue (obviously, the actual nearest-neighbor query methods I use are far more complicated):
template <typename DistanceValue,
typename ForwardIterator,
typename OutputIterator,
typename GetDistanceFunction,
typename CompareFunction>
inline
OutputIterator min_dist_linear_search(ForwardIterator first,
ForwardIterator last,
OutputIterator output_first,
GetDistanceFunction distance,
CompareFunction compare,
std::size_t max_neighbors = 1,
DistanceValue radius = std::numeric_limits<DistanceValue>::infinity()) {
if(first == last)
return output_first;
typedef std::priority_queue< std::pair<DistanceValue, ForwardIterator>,
std::vector< std::pair<DistanceValue, ForwardIterator> >,
detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction> > PriorityQueue;
PriorityQueue output_queue = PriorityQueue(detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction>(compare));
for(; first != last; ++first) {
DistanceValue d = distance(*first);
if(!compare(d, radius))
continue;
output_queue.push(std::pair<DistanceValue, ForwardIterator>(d, first));
while(output_queue.size() > max_neighbors)
output_queue.pop();
if(output_queue.size() == max_neighbors)
radius = output_queue.top().first;
};
OutputIterator it = output_first;
while( !output_queue.empty() ) {
*it = *(output_queue.top().second);
output_queue.pop(); ++it;
};
std::reverse(output_first, it);
return it;
};
The above is all dandy except for one thing: it requires the output-iterator type to be bidirectional and essentially be pointing to a pre-allocated container. Now, this practice of storing the output in a range prescribed by some output iterator is great and pretty standard too (e.g. std::copy and other STL algorithms are good examples of that). However, in this case I would like to be able to only require a forward output-iterator type, which would make it possible to use back-inserter iterators like those provided for STL containers and iostreams.
So, this boils down to reversing the priority-queue before dumping its content in the output iterator. So, these are the better options I've been able to come up with:
Create a std::vector, dump the priority-queue content in it, and dump the elements into the output-iterator using a reverse-iterator on the vector.
Replace the std::priority_queue with a sorted container (e.g. std::multimap), and then dump the content into the output-iterator using the appropriate traversal order.
Are there any other reasonable option?
I used to employ a std::multimap in a previous implementation of this algorithm and others, as of my second option above. However, when I switched to std::priority_queue, the performance gain was significant. So, I'd rather not use the second option, as it really seems that using a priority-queue for maintaining the list of neighbors is much better than relying on a sorted array. Btw, I also tried a std::vector that I maintain sorted with std::inplace_merge, which was better than multimap, but didn't match up to the priority-queue.
As for the first option, which is my best option at this point, it just seems wasteful to me to have to do this double transfer of data (queue -> vector -> output). I'm just inclined to think that there must be a simpler way to do this... something that I'm missing..
The first option really isn't that bad in this application (considering the complexity of the algorithm that precedes it), but if there is a trick to avoid this double memory transfer, I'd like to know about it.

Problem solved!
I'm such an idiot... I knew I was missing something obvious. In this case, the std::sort_heap() function. The reference page even has an example that does exactly what I need (and since the std::priority_queue is just implemented in terms of a random-access container and the heap-functions (pop_heap, push_heap, make_heap) it makes no real difference to use these functions directly in-place of the std::priority_queue class). I don't know how I could have missed that.
Anyways, I hope this helps anyone who had the same problem.

One dirty idea, which would nevertheless be guaranteed to work, would be the following:
std::priority_queue<int, std::vector<int>, std::less<int> > queue;
queue.push(3);
queue.push(5);
queue.push(9);
queue.push(2);
// Prints in reverse order.
int* front = const_cast<int*>(&queue.top());
int* back = const_cast<int*>(front + queue.size());
std::sort(front, back);
while (front < back) {
printf("%i ", *front);
++front;
}
It may be noted that the in-place sorting will likely break the queue.

why don't you just specify the opposite comparison function in the declaration:
#include <iostream>
#include <queue>
#include <vector>
#include <functional>
int main() {
std::priority_queue<int, std::vector<int>, std::greater<int> > pq;
pq.push(1);
pq.push(10);
pq.push(15);
std::cout << pq.top() << std::endl;
}

What's wrong with my vector<T>::erase here?

I have two vector<T> in my program, called active and non_active respectively. This refers to the objects it contains, as to whether they are in use or not.
I have some code that loops the active vector and checks for any objects that might have gone non active. I add these to a temp_list inside the loop.
Then after the loop, I take my temp_list and do non_active.insert of all elements in the temp_list.
After that, I do call erase on my active vector and pass it the temp_list to erase.
For some reason, however, the erase crashes.
This is the code:
non_active.insert(non_active.begin(), temp_list.begin(), temp_list.end());
active.erase(temp_list.begin(), temp_list.end());
I get this assertion:
Expression:("_Pvector == NULL || (((_Myvec*)_Pvector)->_Myfirst <= _Ptr && _Ptr <= ((_Myvect*)_Pvector)->_Mylast)",0)
I've looked online and seen that there is a erase-remove idiom, however not sure how I'd apply that to a removing a range of elements from a vector<T>
I'm not using C++11.

erase expects a range of iterators passed to it that lie within the current vector. You cannot pass iterators obtained from a different vector to erase.
Here is a possible, but inefficient, C++11 solution supported by lambdas:
active.erase(std::remove_if(active.begin(), active.end(), [](const T& x)
{
return std::find(temp_list.begin(), temp_list.end(), x) != temp_list.end();
}), active.end());
And here is the equivalent C++03 solution without the lambda:
template<typename Container>
class element_of
{
Container& container;
element_of(Container& container) : container(container) {}
public:
template<typename T>
bool operator()(const T& x) const
{
return std::find(container.begin(), container.end(), x)
!= container.end();
}
};
// ...
active.erase(std::remove_if(active.begin(), active.end(),
element_of<std::vector<T> >(temp_list)),
active.end());
If you replace temp_list with a std::set and the std::find_if with a find member function call on the set, the performance should be acceptable.

The erase method is intended to accept iterators to the same container object. You're trying to pass in iterators to temp_list to use to erase elements from active which is not allowed for good reasons, as a Sequence's range erase method is intended to specify a range in that Sequence to remove. It's important that the iterators are in that sequence because otherwise we're specifying a range of values to erase rather than a range within the same container which is a much more costly operation.
The type of logic you're trying to perform suggests to me that a set or list might be better suited for the purpose. That is, you're trying to erase various elements from the middle of a container that match a certain condition and transfer them to another container, and you could eliminate the need for temp_list this way.
With list, for example, it could be as easy as this:
for (ActiveList::iterator it = active.begin(); it != active.end();)
{
if (it->no_longer_active())
{
inactive.push_back(*it);
it = active.erase(it);
}
else
++it;
}
However, sometimes vector can outperform these solutions, and maybe you have need for vector for other reasons (like ensuring contiguous memory). In that case, std::remove_if is your best bet.
Example:
bool not_active(const YourObjectType& obj);
active_list.erase(
remove_if(active_list.begin(), active_list.end(), not_active),
active_list.end());
More info on this can be found under the topic, 'erase-remove idiom' and you may need predicate function objects depending on what external states are required to determine if an object is no longer active.

You can actually make the erase/remove idiom usable for your case. You just need to move the value over to the other container before std::remove_if possibly shuffles it around: in the predicate.
template<class OutIt, class Pred>
struct copy_if_predicate{
copy_if_predicate(OutIt dest, Pred p)
: dest(dest), pred(p) {}
template<class T>
bool operator()(T const& v){
if(pred(v)){
*dest++ = v;
return true;
}
return false;
}
OutIt dest;
Pred pred;
};
template<class OutIt, class Pred>
copy_if_predicate<OutIt,Pred> copy_if_pred(OutIt dest, Pred pred){
return copy_if_predicate<OutIt,Pred>(dest,pred);
}
Live example on Ideone. (I directly used bools to make the code shorter, not bothering with output and the likes.)

The function std::vector::erase requires the iterators to be iterators into this vector, but you are passing iterators from temp_list. You cannot erase elements from a container that are in a completely different container.

active.erase(temp_list.begin(), temp_list.end());
You try to erase elements from one list, but you use iterators for second list. First list iterators aren't the same, like in second list.

I would like to suggest that this is an example of where std::list should be used. You can splice members from one list to another. Look at std::list::splice()for this.
Do you need random access? If not then you don't need a std::vector.
Note that with list, when you splice, your iterators, and references to the objects in the list remain valid.
If you don't mind making the implementation "intrusive", your objects can contain their own iterator value, so they know where they are. Then when they change state, they can automate their own "moving" from one list to the other, and you don't need to transverse the whole list for them. (If you want this sweep to happen later, you can get them to "register" themselves for later moving).
I will write an algorithm here now to run through one collection and if a condition exists, it will effect a std::remove_if but at the same time will copy the element into your "inserter".
//fwd iterator must be writable
template< typename FwdIterator, typename InputIterator, typename Pred >
FwdIterator copy_and_remove_if( FwdIterator inp, FwdIterator end, InputIterator outp, Pred pred )
{
for( FwdIterator test = inp; test != end; ++test )
{
if( pred(*test) ) // insert
{
*outp = *test;
++outp;
}
else // keep
{
if( test != inp )
{
*inp = *test;
}
++inp;
}
}
return inp;
}
This is a bit like std::remove_if but will copy the ones being removed into an alternative collection. You would invoke it like this (for a vector) where isInactive is a valid predicate that indicates it should be moved.
active.erase( copy_and_remove_if( active.begin(), active.end(), std::back_inserter(inactive), isInactive ), active.end() );

The iterators you pass to erase() should point into the vector itself; the assertion is telling you that they don't. This version of erase() is for erasing a range out of the vector.
You need to iterate over temp_list yourself and call active.erase() on the result of dereferencing the iterator at each step.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Determining if an unordered vector<T> has all unique elements - c++

Related

Extract subvector in constant time

How to efficiently compare vectors with C++?

Accessing elements of a list of lists in C++

Obtaining `std::priority_queue` elements in reverse order?

What's wrong with my vector<T>::erase here?

Categories

Resources