Obtaining `std::priority_queue` elements in reverse order? - c++

I've written some K-nearest-neighbor query methods which build a list of points that are nearest to a given query point. To maintain that list of neighbors, I use the std::priority_queue such that the top element is the farthest neighbor to the query point. This way I know if I should push the new element that is currently being examined (if at a lesser distance than the current farthest neighbor) and can pop() the farthest element when my priority-queue has more than K elements.
So far, all is well. However, when I output the elements, I would like to order them from the closest to the farthest. Currently, I simply pop all the elements from the priority-queue and put them on the output-container (through an iterator), which results in a sequence of points ordered from farthest to closest, so then, I call std::reverse on the output iterator range.
As a simple example, here is a linear-search that uses the priority-queue (obviously, the actual nearest-neighbor query methods I use are far more complicated):
template <typename DistanceValue,
typename ForwardIterator,
typename OutputIterator,
typename GetDistanceFunction,
typename CompareFunction>
inline
OutputIterator min_dist_linear_search(ForwardIterator first,
ForwardIterator last,
OutputIterator output_first,
GetDistanceFunction distance,
CompareFunction compare,
std::size_t max_neighbors = 1,
DistanceValue radius = std::numeric_limits<DistanceValue>::infinity()) {
if(first == last)
return output_first;
typedef std::priority_queue< std::pair<DistanceValue, ForwardIterator>,
std::vector< std::pair<DistanceValue, ForwardIterator> >,
detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction> > PriorityQueue;
PriorityQueue output_queue = PriorityQueue(detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction>(compare));
for(; first != last; ++first) {
DistanceValue d = distance(*first);
if(!compare(d, radius))
continue;
output_queue.push(std::pair<DistanceValue, ForwardIterator>(d, first));
while(output_queue.size() > max_neighbors)
output_queue.pop();
if(output_queue.size() == max_neighbors)
radius = output_queue.top().first;
};
OutputIterator it = output_first;
while( !output_queue.empty() ) {
*it = *(output_queue.top().second);
output_queue.pop(); ++it;
};
std::reverse(output_first, it);
return it;
};
The above is all dandy except for one thing: it requires the output-iterator type to be bidirectional and essentially be pointing to a pre-allocated container. Now, this practice of storing the output in a range prescribed by some output iterator is great and pretty standard too (e.g. std::copy and other STL algorithms are good examples of that). However, in this case I would like to be able to only require a forward output-iterator type, which would make it possible to use back-inserter iterators like those provided for STL containers and iostreams.
So, this boils down to reversing the priority-queue before dumping its content in the output iterator. So, these are the better options I've been able to come up with:
Create a std::vector, dump the priority-queue content in it, and dump the elements into the output-iterator using a reverse-iterator on the vector.
Replace the std::priority_queue with a sorted container (e.g. std::multimap), and then dump the content into the output-iterator using the appropriate traversal order.
Are there any other reasonable option?
I used to employ a std::multimap in a previous implementation of this algorithm and others, as of my second option above. However, when I switched to std::priority_queue, the performance gain was significant. So, I'd rather not use the second option, as it really seems that using a priority-queue for maintaining the list of neighbors is much better than relying on a sorted array. Btw, I also tried a std::vector that I maintain sorted with std::inplace_merge, which was better than multimap, but didn't match up to the priority-queue.
As for the first option, which is my best option at this point, it just seems wasteful to me to have to do this double transfer of data (queue -> vector -> output). I'm just inclined to think that there must be a simpler way to do this... something that I'm missing..
The first option really isn't that bad in this application (considering the complexity of the algorithm that precedes it), but if there is a trick to avoid this double memory transfer, I'd like to know about it.

Problem solved!
I'm such an idiot... I knew I was missing something obvious. In this case, the std::sort_heap() function. The reference page even has an example that does exactly what I need (and since the std::priority_queue is just implemented in terms of a random-access container and the heap-functions (pop_heap, push_heap, make_heap) it makes no real difference to use these functions directly in-place of the std::priority_queue class). I don't know how I could have missed that.
Anyways, I hope this helps anyone who had the same problem.

One dirty idea, which would nevertheless be guaranteed to work, would be the following:
std::priority_queue<int, std::vector<int>, std::less<int> > queue;
queue.push(3);
queue.push(5);
queue.push(9);
queue.push(2);
// Prints in reverse order.
int* front = const_cast<int*>(&queue.top());
int* back = const_cast<int*>(front + queue.size());
std::sort(front, back);
while (front < back) {
printf("%i ", *front);
++front;
}
It may be noted that the in-place sorting will likely break the queue.

why don't you just specify the opposite comparison function in the declaration:
#include <iostream>
#include <queue>
#include <vector>
#include <functional>
int main() {
std::priority_queue<int, std::vector<int>, std::greater<int> > pq;
pq.push(1);
pq.push(10);
pq.push(15);
std::cout << pq.top() << std::endl;
}

Related

How to implement something like std::copy_if but apply a function before inserting into a different container

Full disclosure, this may be a hammer and nail situation trying to use STL algorithms when none are needed. I have seen a reappearing pattern in some C++14 code I am working with. We have a container that we iterate through, and if the current element matches some condition, then we copy one of the elements fields to another container.
The pattern is something like:
for (auto it = std::begin(foo); it!=std::end(foo); ++it){
auto x = it->Some_member;
// Note, the check usually uses the field would add to the new container.
if(f(x) && g(x)){
bar.emplace_back(x);
}
}
The idea is almost an accumulate where the function being applied does not always return a value. I can only think of a solutions that either
Require a function for accessing the member your want to accumulate and another function for checking the condition. i.e How to combine std::copy_if and std::transform?
Are worse then the thing I want to replace.
Is this even a good idea?
A quite general solution to your issue would be the following (working example):
#include <iostream>
#include <vector>
using namespace std;
template<typename It, typename MemberType, typename Cond, typename Do>
void process_filtered(It begin, It end, MemberType iterator_traits<It>::value_type::*ptr, Cond condition, Do process)
{
for(It it = begin; it != end; ++it)
{
if(condition((*it).*ptr))
{
process((*it).*ptr);
}
}
}
struct Data
{
int x;
int y;
};
int main()
{
// thanks to iterator_traits, vector could also be an array;
// kudos to #Yakk-AdamNevraumont
vector<Data> lines{{1,2},{4,3},{5,6}};
// filter even numbers from Data::x and output them
process_filtered(std::begin(lines), std::end(lines), &Data::x, [](int n){return n % 2 == 0;}, [](int n){cout << n;});
// output is 4, the only x value that is even
return 0;
}
It does not use STL, that is right, but you merely pass an iterator pair, the member to lookup and two lambdas/functions to it that will first filter and second use the filtered output, respectively.
I like your general solutions but here you do not need to have a lambda that extracts the corresponding attribute.
Clearly, the code can be refined to work with const_iterator but for a general idea, I think, it should be helpful. You could also extend it to have a member function that returns a member attribute instead of a direct member attribute pointer, if you'd like to use this method for encapsulated classes.
Sure. There are a bunch of approaches.
Find a library with transform_if, like boost.
Find a library with transform_range, which takes a transformation and range or container and returns a range with the value transformed. Compose this with copy_if.
Find a library with filter_range like the above. Now, use std::transform with your filtered range.
Find one with both, and compose filtering and transforming in the appropriate order. Now your problem is just copying (std::copy or whatever).
Write your own back-inserter wrapper that transforms while inserting. Use that with std::copy_if.
Write your own range adapters, like 2 3 and/or 4.
Write transform_if.

Inplace versions of set_difference, set_intersection and set_union

I implemented versions of set_union, set_intersection and set_difference that take a sorted container and a sorted range (that must not be within the container), and write the result of the operation into the container.
template<class Container, class Iter>
void assign_difference(Container& cont, Iter first, Iter last)
{
auto new_end = std::set_difference( // (1)
cont.begin(), cont.end(), first, last, cont.begin());
cont.erase(new_end, cont.end());
}
template<class Container, class Iter>
void assign_intersection(Container& cont, Iter first, Iter last)
{
auto new_end = std::set_intersection( // (2)
cont.begin(), cont.end(), first, last, cont.begin());
cont.erase(new_end, cont.end());
}
template<class Container, class Iter>
void assign_union(Container& cont, Iter first, Iter last)
{
auto insert_count = last - first;
cont.resize(cont.size() + insert_count); // T must be default-constructible
auto rfirst1 = cont.rbegin() + insert_count, rlast1 = cont.rend();
auto rfirst2 = std::make_reverse_iterator(last);
auto rlast2 = std::make_reverse_iterator(first);
rlast1 = std::set_union( // (3)
rfirst1, rlast1, rfirst2, rlast2, cont.rbegin(), std::greater<>());
cont.erase(std::copy(rlast1.base(), cont.end(), cont.begin()), cont.end());
}
The goal was:
No allocation is performed if the container has enaugh capacity to hold the result.
Otherwise exactly one allocation is performed to give the container the capacity to hold the result.
As you can see in the lines marked (1), (2) and (3), the same container is used as input and output for those STL algorithms. Assuming a usual implementation of those STL algorithms, this code works, since it only writes to parts of the container that have already been processed.
As pointed out in the comments, it's not guaranteed by the standard that this works. set_union, set_intersection and set_difference require that the resulting range doesn't overlap with one of the input ranges.
However, can there be a STL implementation that breaks the code?
If your answer is yes, please provide a conforming implementations of one of the three used STL algorithms that breaks the code.
A conforming implementation could check if argument 1 and 5 of set_intersection are equal, and if they are format your harddrive.
If you violate the requirements, the behaviour of your program is not constrained by the standard; your program is ill formed.
There are situations where UB may be worth the risk and cost (auditing all compiler changes and assembly output). I do not see the point here; write your own. Any fancy optimizations that the std library comes up with could cause problems when you violate requirements as you are doing, and as you have noted the naive implementation is simple.
As rule of thumb I use do not write on a container on which you are iterating. Everything can happen. In general it's odd.
As #Yakk said, it sounds ill. That's it. Something to be removed from your code base an sleep peacefully.
If you really need those functions, I would suggest to write by yourself the inner loop (eg: the inner of std::set_intersection) in order to handle the constraint you need for your algorithm to work.
I don't think that seeking for an STL implementation on which it doesn't work is the right approach. It doesn't sound like a long term solution. For the long term: the standard should be your reference, and as someone already pointed out, your solution doesn't seems to properly deal with it.
My 2 cents

Accessing elements of a list of lists in C++

I have a list of lists like this:
std::list<std::list<double> > list;
I filled it with some lists with doubles in them (actually quite a lot, which is why I am not using a vector. All this copying takes up a lot of time.)
Say I want to access the element that could be accesed like list[3][3] if the list were not a list but a vector or two dimensional array. How would I do that?
I know that accessing elements in a list is accomplished by using an iterator. I couldn't figure out how to get out the double though.
double item = *std::next(std::begin(*std::next(std::begin(list), 3)), 3);
Using a vector would usually have much better performance, though; accessing element n of a list is O(n).
If you're concerned about performance of splicing the interior of the container, you could use deque, which has operator[], amortized constant insertion and deletion from either end, and linear time insertion and deletion from the interior.
For C++03 compilers, you can implement begin and next yourself:
template<typename Container>
typename Container::iterator begin(Container &container)
{
return container.begin();
}
template<typename Container>
typename Container::const_iterator begin(const Container &container)
{
return container.begin();
}
template<typename T, int n>
T *begin(T (&array)[n])
{
return &array[0];
}
template<typename Iterator>
Iterator next(Iterator it, typename std::iterator_traits<Iterator>::difference_type n = 1)
{
std::advance(it, n);
return it;
}
To actually answer your question, you should probably look at std::advance.
To strictly answer your question, Joachim Pileborg's answer is the way to go:
std::list<std::list<double> >::iterator it = list.begin();
std::advance(it, 3);
std::list<double>::iterator it2 = (*it).begin();
std::advance(it2, 3);
double d = *it2;
Now, from your question and further comments it is not clear whether you always add elements to the end of the lists or they can be added anywhere. If you always add to the end, vector<double> will work better. A vector<T> does not need to be copied every time its size increases; only whenever its capacity increases, which is a very different thing.
In addition to this, using reserve(), as others said before, will help a lot with the reallocations. You don't need to reserve for the combined size of all vectors, but only for each individual vector. So:
std::vector<std::vector<double> > v;
v.reserve(512); // If you are inserting 400 vectors, with a little extra just in case
And you would also reserve for each vector<double> inside v. That's all.
Take into account that your list of lists will take much more space. For each double in the internal list, it will have to allocate at least two additional pointers, and also two additional pointers for each list inside the global least. This means that the total memory taken by your container will be roughly three times that of the vector. And all this allocation and management also takes extra runtime.

Skipping iterator

I have a sequence of values that I'd like to pass to a function that takes a (iterator begin, iterator end) pair. However, I only want every second element in the original sequence to be processed.
Is there a nice way using Standard-Lib/Boost to create an iterator facade that will allow me to pass in the original sequence? I figured something simple like this would already be in the boost iterators or range libraries, but I didn't find anything.
Or am I missing another completely obvious way to do this? Of course, I know I always have the option of copying the values to another sequence, but that's not what I want to do.
Edit: I know about filter_iterator, but that filters on values - it doesn't change the way the iteration advances.
I think you want boost::adaptors::strided
struct TrueOnEven {
template< typename T >
bool operator()(const T&) { return mCount++ % 2 == 0; }
TrueOnEven() : mCount(0) {}
private:
int mCount;
};
int main() {
std::vector< int > tVec, tOtherVec;
...
typedef boost::filter_iterator< TrueOnEven, int > TakeEvenFilterType;
std::copy(
TakeEvenFilterType(tVec.begin(), tVec.end()),
TakeEvenFilterType(tVec.end(), tVec.end()),
std::back_inserter(tOtherVec));
}
To be honest, this is anything else than nice and intuitive. I wrote a simple "Enumerator" library including lazy integrated queries to avoid hotchpotch like the above. It allows you to write:
Query::From(tVec.begin(), tVec.end())
.Skip<2>()
.ToStlSequence(std::back_inserter(tOtherVec));
where Skip<2> basically instantiates a generalized "Filter" which skips every N-th (in this case every second) element.
Here's Boost's filter iterator. It is exactly what you want.
UPDATE: Sorry, read wrongly-ish. Here's a list of all iterator funkiness in Boost:
http://www.boost.org/doc/libs/1_46_1/libs/iterator/doc/#specialized-adaptors
I think a plain iterator_adaptor with an overloaded operator++ that increments the underlying iterator value twice is all you need.

Determining if an unordered vector<T> has all unique elements

Profiling my cpu-bound code has suggested I that spend a long time checking to see if a container contains completely unique elements. Assuming that I have some large container of unsorted elements (with < and = defined), I have two ideas on how this might be done:
The first using a set:
template <class T>
bool is_unique(vector<T> X) {
set<T> Y(X.begin(), X.end());
return X.size() == Y.size();
}
The second looping over the elements:
template <class T>
bool is_unique2(vector<T> X) {
typename vector<T>::iterator i,j;
for(i=X.begin();i!=X.end();++i) {
for(j=i+1;j!=X.end();++j) {
if(*i == *j) return 0;
}
}
return 1;
}
I've tested them the best I can, and from what I can gather from reading the documentation about STL, the answer is (as usual), it depends. I think that in the first case, if all the elements are unique it is very quick, but if there is a large degeneracy the operation seems to take O(N^2) time. For the nested iterator approach the opposite seems to be true, it is lighting fast if X[0]==X[1] but takes (understandably) O(N^2) time if all the elements are unique.
Is there a better way to do this, perhaps a STL algorithm built for this very purpose? If not, are there any suggestions eek out a bit more efficiency?
Your first example should be O(N log N) as set takes log N time for each insertion. I don't think a faster O is possible.
The second example is obviously O(N^2). The coefficient and memory usage are low, so it might be faster (or even the fastest) in some cases.
It depends what T is, but for generic performance, I'd recommend sorting a vector of pointers to the objects.
template< class T >
bool dereference_less( T const *l, T const *r )
{ return *l < *r; }
template <class T>
bool is_unique(vector<T> const &x) {
vector< T const * > vp;
vp.reserve( x.size() );
for ( size_t i = 0; i < x.size(); ++ i ) vp.push_back( &x[i] );
sort( vp.begin(), vp.end(), ptr_fun( &dereference_less<T> ) ); // O(N log N)
return adjacent_find( vp.begin(), vp.end(),
not2( ptr_fun( &dereference_less<T> ) ) ) // "opposite functor"
== vp.end(); // if no adjacent pair (vp_n,vp_n+1) has *vp_n < *vp_n+1
}
or in STL style,
template <class I>
bool is_unique(I first, I last) {
typedef typename iterator_traits<I>::value_type T;
…
And if you can reorder the original vector, of course,
template <class T>
bool is_unique(vector<T> &x) {
sort( x.begin(), x.end() ); // O(N log N)
return adjacent_find( x.begin(), x.end() ) == x.end();
}
You must sort the vector if you want to quickly determine if it has only unique elements. Otherwise the best you can do is O(n^2) runtime or O(n log n) runtime with O(n) space. I think it's best to write a function that assumes the input is sorted.
template<class Fwd>
bool is_unique(In first, In last)
{
return adjacent_find(first, last) == last;
}
then have the client sort the vector, or a make a sorted copy of the vector. This will open a door for dynamic programming. That is, if the client sorted the vector in the past then they have the option to keep and refer to that sorted vector so they can repeat this operation for O(n) runtime.
The standard library has std::unique, but that would require you to make a copy of the entire container (note that in both of your examples you make a copy of the entire vector as well, since you unnecessarily pass the vector by value).
template <typename T>
bool is_unique(std::vector<T> vec)
{
std::sort(vec.begin(), vec.end());
return std::unique(vec.begin(), vec.end()) == vec.end();
}
Whether this would be faster than using a std::set would, as you know, depend :-).
Is it infeasible to just use a container that provides this "guarantee" from the get-go? Would it be useful to flag a duplicate at the time of insertion rather than at some point in the future? When I've wanted to do something like this, that's the direction I've gone; just using the set as the "primary" container, and maybe building a parallel vector if I needed to maintain the original order, but of course that makes some assumptions about memory and CPU availability...
For one thing you could combine the advantages of both: stop building the set, if you have already discovered a duplicate:
template <class T>
bool is_unique(const std::vector<T>& vec)
{
std::set<T> test;
for (typename std::vector<T>::const_iterator it = vec.begin(); it != vec.end(); ++it) {
if (!test.insert(*it).second) {
return false;
}
}
return true;
}
BTW, Potatoswatter makes a good point that in the generic case you might want to avoid copying T, in which case you might use a std::set<const T*, dereference_less> instead.
You could of course potentially do much better if it wasn't generic. E.g if you had a vector of integers of known range, you could just mark in an array (or even bitset) if an element exists.
You can use std::unique, but it requires the range to be sorted first:
template <class T>
bool is_unique(vector<T> X) {
std::sort(X.begin(), X.end());
return std::unique(X.begin(), X.end()) == X.end();
}
std::unique modifies the sequence and returns an iterator to the end of the unique set, so if that's still the end of the vector then it must be unique.
This runs in nlog(n); the same as your set example. I don't think you can theoretically guarantee to do it faster, although using a C++0x std::unordered_set instead of std::set would do it in expected linear time - but that requires that your elements be hashable as well as having operator == defined, which might not be so easy.
Also, if you're not modifying the vector in your examples, you'd improve performance by passing it by const reference, so you don't make an unnecessary copy of it.
If I may add my own 2 cents.
First of all, as #Potatoswatter remarked, unless your elements are cheap to copy (built-in/small PODs) you'll want to use pointers to the original elements rather than copying them.
Second, there are 2 strategies available.
Simply ensure there is no duplicate inserted in the first place. This means, of course, controlling the insertion, which is generally achieved by creating a dedicated class (with the vector as attribute).
Whenever the property is needed, check for duplicates
I must admit I would lean toward the first. Encapsulation, clear separation of responsibilities and all that.
Anyway, there are a number of ways depending on the requirements. The first question is:
do we have to let the elements in the vector in a particular order or can we "mess" with them ?
If we can mess with them, I would suggest keeping the vector sorted: Loki::AssocVector should get you started.
If not, then we need to keep an index on the structure to ensure this property... wait a minute: Boost.MultiIndex to the rescue ?
Thirdly: as you remarked yourself a simple linear search doubled yield a O(N2) complexity in average which is no good.
If < is already defined, then sorting is obvious, with its O(N log N) complexity.
It might also be worth it to make T Hashable, because a std::tr1::hash_set could yield a better time (I know, you need a RandomAccessIterator, but if T is Hashable then it's easy to have T* Hashable to ;) )
But in the end the real issue here is that our advises are necessary generic because we lack data.
What is T, do you intend the algorithm to be generic ?
What is the number of elements ? 10, 100, 10.000, 1.000.000 ? Because asymptotic complexity is kind of moot when dealing with a few hundreds....
And of course: can you ensure unicity at insertion time ? Can you modify the vector itself ?
Well, your first one should only take N log(N), so it's clearly the better worse case scenario for this application.
However, you should be able to get a better best case if you check as you add things to the set:
template <class T>
bool is_unique3(vector<T> X) {
set<T> Y;
typename vector<T>::const_iterator i;
for(i=X.begin(); i!=X.end(); ++i) {
if (Y.find(*i) != Y.end()) {
return false;
}
Y.insert(*i);
}
return true;
}
This should have O(1) best case, O(N log(N)) worst case, and average case depends on the distribution of the inputs.
If the type T You store in Your vector is large and copying it is costly, consider creating a vector of pointers or iterators to Your vector elements. Sort it based on the element pointed to and then check for uniqueness.
You can also use the std::set for that. The template looks like this
template <class Key,class Traits=less<Key>,class Allocator=allocator<Key> > class set
I think You can provide appropriate Traits parameter and insert raw pointers for speed or implement a simple wrapper class for pointers with < operator.
Don't use the constructor for inserting into the set. Use insert method. The method (one of overloads) has a signature
pair <iterator, bool> insert(const value_type& _Val);
By checking the result (second member) You can often detect the duplicate much quicker, than if You inserted all elements.
In the (very) special case of sorting discrete values with a known, not too big, maximum value N.
You should be able to start a bucket sort and simply check that the number of values in each bucket is below 2.
bool is_unique(const vector<int>& X, int N)
{
vector<int> buckets(N,0);
typename vector<int>::const_iterator i;
for(i = X.begin(); i != X.end(); ++i)
if(++buckets[*i] > 1)
return false;
return true;
}
The complexity of this would be O(n).
Using the current C++ standard containers, you have a good solution in your first example. But if you can use a hash container, you might be able to do better, as the hash set will be nO(1) instead of nO(log n) for a standard set. Of course everything will depend on the size of n and your particular library implementation.