Boost R-tree : counting elements satisfying a query - c++

So far, when I want to count how many elements in my R-tree satisfy a specific spatial query, it boils down to running the query, collecting the matches and then counting them, roughly as follow:
std::vector<my_type> results;
rtree_ptr->query(bgi::intersects(query_box), std::back_inserter(results));
int nbElements = results.size();
Is there a better way, i.e. a way to directly count without retrieving the actual elements? I haven't found anything to do that but who knows. (I'm building my tree with the packing algorithm, in case it has any relevance.)
My motivation is that I noticed that the speed of my queries depend on the number of matches. If there are 0 matches, the query is more or less instantaneous ; if there are 10 000 matches, it takes several seconds. Since it's possible to determine very fast whether there are any matches, it seems that traversing the tree is extremely fast (at least in the index I made) ; it is collecting all the results that makes the queries slower in case of many matches. Since I'm not interested in collecting but simply counting (at least for some queries), it would be awesome if I could just skip the collecting.

I had a late brainwave. Even better than using function_output_iterator could be using the boost::geometry::index query_iterators.
In principle, it will lead to exactly the same behaviour with slightly simpler code:
box query_box;
auto r = boost::make_iterator_range(bgi::qbegin(tree, bgi::intersects(query_box)), {});
// in c++03, spell out the end iterator: bgi::qend(tree)
size_t nbElements = boost::distance(r);
NOTE: size() is not available because the query_const_iterators are not of the random-access category.
But it may be slightly more comfortable to combine. Say, if you wanted an additional check per item, you'd use standard library algorithms like:
size_t matching = std::count_if(r.begin(), r.end(), some_predicate);
I think the range-based solution is somewhat more flexible (the same code can be used to achieve other algorithms like partial_sort_copy or std::transform which would be hard to fit into the output-iterator idiom from my earlier answer).

You can use a function output iterator:
size_t cardinality = 0; // number of matches in set
auto count_only = boost::make_function_output_iterator([&cardinality] (Tree::value_type const&) { ++cardinality; });
Use it like this:
C++11 using a lambda
Live On Coliru
#include <boost/function_output_iterator.hpp>
#include <boost/geometry/geometries/box.hpp>
#include <boost/geometry/geometries/point_xy.hpp>
#include <boost/geometry/core/cs.hpp>
#include <boost/geometry/index/rtree.hpp>
namespace bgi = boost::geometry::index;
using point = boost::geometry::model::d2::point_xy<int, boost::geometry::cs::cartesian>;
using box = boost::geometry::model::box<point>;
int main()
{
using Tree = bgi::rtree<box, bgi::rstar<32> >;
Tree tree;
size_t cardinality = 0; // number of matches in set
auto count_only = boost::make_function_output_iterator([&cardinality] (Tree::value_type const&) { ++cardinality; });
box query_box;
tree.query(bgi::intersects(query_box), count_only);
int nbElements = cardinality;
return nbElements;
}
C++03 using a function object
For C++ you can replace the lambda with a (polymorphic!) function object:
struct count_only_f {
count_only_f(size_t& card) : _cardinality(&card) { }
template <typename X>
void operator()(X) const {
++(*_cardinality);
}
private:
size_t *_cardinality;
};
// .... later:
boost::function_output_iterator<count_only_f> count_only(cardinality);
C++03 using Boost Phoenix
I would consider this a good place to use Boost Phoenix:
#include <boost/phoenix.hpp>
// ...
size_t cardinality = 0; // number of matches in set
tree.query(bgi::intersects(query_box), boost::make_function_output_iterator(++boost::phoenix::ref(cardinality)));
Or, more typically with namespace aliases:
#include <boost/phoenix.hpp>
// ...
size_t cardinality = 0; // number of matches in set
tree.query(bgi::intersects(query_box), make_function_output_iterator(++phx::ref(cardinality)));

Related

How do I insert into boost::unordered_set<boost::unordered_set<int> >?

The following code fails to compile, but if I remove the commented line, it compiles and runs correctly. I was only intending to use boost because C++ doesn't provide a hash function for std::unordered_set<int> by default.
#include <iostream>
#include <boost/unordered_set.hpp>
int main() {
boost::unordered_set<boost::unordered_set<int> > fam;
boost::unordered_set<int> s;
s.insert(5);
s.insert(6);
s.insert(7);
std::cout << s.size() << std::endl;
fam.insert(s); // this is the line causing the problem
return 0;
}
Edit 1:
I want to be more clear than I was in the OP. First I know that the idea of the boost::unordered_set<> is that it is implemented with a hash table, rather than a BST. I know that anything that is to be a template type to the boost::unordered_set<> needs to have a hash function and equality function provided. I also know that by default the std::unordered_set<> does not have a hash function defined which is why the following code does not compile:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set<std::unordered_set<int> > fam;
return 0;
}
However, I thought that boost provides hash functions for all their containers which is why I believed the following code does compile:
#include <iostream>
#include <boost/unordered_set.hpp>
int main() {
boost::unordered_set<boost::unordered_set<int> > fam;
return 0;
}
So now, I'm not sure why boost code just above compiles, but the code in the OP does not. Was I wrong that boost provides a hash function for all their containers? I would really like to avoid having to define a new hash function, especially when my actual intended use is to have a much more complicated data structure: boost::unordered_map<std::pair<boost::unordered_map<int, int>, boost::unordered_map<int, int> >, int>. It seems like this should be a solved problem that I shouldn't have to define myself, since IIRC python can handle sets of sets no problem.
An unordered_set (or _map) uses hashing, and requires a hash operator to be defined for its elements. There is no hash operator defined for boost::unordered_set<int>, therefore it cannot put such a type of element into your set.
You may write your own hash function for this. For example, this is a typical generic hash approach, though you may want to customize it for your particular data. If you drop this code into your example, it should work:
namespace boost {
std::size_t hash_value(boost::unordered_set<int> const& arg) {
std::size_t hashCode = 1;
for (int e : arg)
hashCode = 31 * hashCode + hash<int>{}(e);
return hashCode;
}
}

Why does += work on std::map keys that don't have values?

I was looking at the way's people solved the problem proposed here:
Given an array of meeting time intervals consisting of start and end times [[s1,e1],[s2,e2],...] (si < ei), find the minimum number of conference rooms required.
One of the solutions was to do the following:
#include <map>
#include <vector>
#include <algorithm>
using std::max;
using std::map;
using std::vector;
struct Interval {
int start;
int end;
Interval() : start(0), end(0) {}
Interval(int s, int e) : start(s), end(e) {}
};
int minMeetingRooms(vector<Interval>& intervals) {
map<int, int> changes;
for (auto i : intervals) {
changes[i.start] += 1;
changes[i.end] -= 1;
}
int rooms = 0, maxrooms = 0;
for (auto change : changes)
maxrooms = max(maxrooms, rooms += change.second);
return maxrooms;
}
Which increments a counter every time a new meeting starts and decrements a counter every time a meeting ends, taking the max of that counter and the previous max on every iteration.
What I'm curious about though is the part where the map is intialized
for (auto i : intervals) {
changes[i.start] += 1;
changes[i.end] -= 1;
}
The values in the map have never been set, but yet you can still use the += operator. I assume this causes the map to create a 0 in that place, which you then increment, but is this undefined behavior? Also is there a default value for every type? For example, if I had a map of <int, string> would what it put as the default value? Does it just call the default constructor?
Basically what I'd like to know is the internals of std::map which allow for one to add to a key that doesn't yet exists, and how it varies from type to type.
As a side note:
If I were trying to write more idiomatic code, I would have put
if (changes.find(i.start) == changes.end()) changes[i.start] = 0
if (changes.find(i.end) == changes.end()) changes[i.end] = 0
but I'm guessing it's a performance hit or something?
Read the docs:
Returns a reference to the value that is mapped to a key equivalent to key, performing an insertion if such key does not already exist.
The insertion just uses the default constructor for the key, and the default constructor for int produces 0.
Your "more idiomatic" code is exactly the opposite of idiomatic. You use operator[] when you want autovivification, you only use count/find when you are trying to avoid autovivification.
If you come from a Python background this may seem backwards, but it's idiomatic C++. C++ std::map behaves more like a Python defaultdict than dict; operator lookup auto-vivifies, avoiding autovivification requires explicit method calls.

Why is `boost::hana::range_c` not a Sequence?

#include <string>
#include <utility>
#include <vector>
#include <boost/hana.hpp>
namespace hana = boost::hana;
template <typename ...T>
void indexed_T_work(T&& ...args)
{
auto indices = hana::range_c<std::size_t, 0, sizeof...(T)>;
auto types = hana::make_tuple(std::forward<T>(args)...);
hana::for_each(
hana::zip(indices, types)
, [](auto&& pair_) { /* Do index-dependent work with each `T` */ }
);
}
int main()
{
indexed_T_work(5, 13, std::vector<std::string>{}, 32.f, 42, "foo");
}
I'd like to use hana::zip on a hana::tuple and hana::range_c, but hana::range_c is not considered a Sequence, which is a requirement for hana::zip. What is the reasoning behind this decision? How can I (idiomatically) accomplish my goal while respecting that decision?
First, there are several solutions:
Solution 1
auto indices = hana::to<hana::tuple_tag>(hana::range_c<std::size_t, 0, sizeof...(T)>);
auto types = hana::make_tuple(std::forward<T>(args)...);
hana::for_each(hana::zip(indices, types), hana::fuse([](auto i, auto&& x) {
// ...
}));
Solution 2
auto indices = hana::range_c<std::size_t, 0, sizeof...(T)>;
auto types = hana::make_tuple(std::forward<T>(args)...);
hana::for_each(indices, [&](auto i) {
auto& x = types[i];
// ...
});
Solution 3
auto types = hana::make_tuple(std::forward<T>(args)...);
hana::size_c<sizeof...(T)>.times.with_index([&](auto i) {
auto& x = types[i];
// ...
});
Solution (1) has the disadvantage of making a copy of each args because zip returns a sequence of sequences, and everything in Hana is by value. Since this is probably not what you want, you should pick whichever you prefer between solutions (2) and (3), which are really equivalent.
Now, the reason why ranges do not model the Sequence concept is because that wouldn't make sense. The Sequence concept requires that we be able to create an arbitrary Sequence using the hana::make function. Hence, for any Sequence tag S, hana::make<S>(...) must create a Sequence of tag S that contains .... However, a range must contain contiguous integral_constants in some interval. Hence, if range was a Sequence, hana::make<hana::range_tag>(...) should contain whatever ... is, which breaks the invariant of a range if ... are not contiguous integral_constants. Consider for example
hana::make<hana::range_tag>(hana::int_c<8>, hana::int_c<3>,
hana::int_c<5>, hana::int_c<10>)
This should be a range containing integral_constants 8,3,5,10, which does not make sense. Another similar example showing why a range can't be a Sequence is the permutations algorithm. The permutations algorithm takes a Sequence and returns a Sequence of Sequences containing all the permutations. Clearly, since a range can only hold integral_constants, it does not make sense to try and create a range of ranges. Examples like this abound.
In other words, ranges are too specialized to model the Sequence concept. The upside of having such a specialized structure is that it's very compile-time efficient. The downside is that it's not a general-purpose container and some operations can't be done on it (like zip). However, you can totally take a range and convert it to a full-blown sequence, if you know what the tradeoff is.

Sorting a range (with no duplicates) in C++, is std::vector and std::sort faster than std::set?

I have a sequence of double (with no duplicates) and I need to sort them. Is filling a vector and then sorting it faster than inserting the values in a set?
Is this question answerable without a knowledge of the implementation of the standard library (and without a knowledge of the hardware on which the program will run) but just with the information provided by the C++ standard?
#include <vector>
#include <set>
#include <algorithm>
#include <random>
#include <iostream>
std::uniform_real_distribution<double> unif(0,10000);
std::default_random_engine re;
int main()
{
std::vector< double > v;
std::set< double > s;
std::vector< double > r;
size_t sz = 10;
for(size_t i = 0; i < sz; i++) {
r.push_back( unif(re) );
}
for(size_t i = 0; i < sz; i++) {
v.push_back(r[i]);
}
std::sort(v.begin(),v.end());
for(size_t i = 0; i < sz; i++) {
s.insert(r[i]);
}
return 0;
}
From the C++ standard, all we can say is that they both have the same asymptotic complexity (O(n*log(n))).
The set may be faster for large objects that can't be efficiently moved or swapped, since the objects don't need to be moved more than once. The vector may be faster for small objects, since sorting it involves no pointer updates and less indirection.
Which is faster in any given situation can only be determined by measuring (or a thorough knowledge of both the implementation and the target platform).
The use of vector may be faster because of data cache factors as the data operated upon will be in a more coherent memory region (probably).
The vector will also have less memory overhead per-value.
If you can, reserve the vector size before inserting data to minimize effort during filling the vector with values.
In terms of complexity both should be the same i.e, nlog(n).
The answer is not trivial. If you have 2 main sections in your software: 1st setup, 2nd lookup and lookup is used more than setup: the sorted vector could be faster, because of 2 reasons:
lower_bound <algorithm> function is faster than the usual tree implementation of <set>,
std::vector memory is allocated less heap page, so there will be less page faults while you are looking for an element.
If the usage is mixed, or lookup is not more then setup, than <set> will be faster. More info: Scott Meyers: Effective STL, Item 23.
Since you said sorting in a range, you could use partial_sort instead of sorting the entire collection.
If we don't want to disturb the existing collection and want to have a new collection with sorted data and no duplicates, then std::set gives us a straight forward solution.
#include <vector>
#include <set>
#include <algorithm>
#include <iostream>
using namespace std;
int main()
{
int arr[] = { 1, 3, 4, 1, 6, 7, 9, 6 , 3, 4, 9 };
vector<int> ints ( arr, end(arr));
const int ulimit = 5;
auto last = ints.begin();
advance(last, ulimit);
set<int> sortedset;
sortedset.insert(ints.begin() , last);
for_each(sortedset.begin(), sortedset.end(), [](int x) { cout << x << "\n"; });
}

Compile-Time container of functors for controlling an algorithm?

Suppose i want something simple like the following:
I have an core-algorithm, which randomly selects one of the specialized algorithms (specialized at compile-time) and process this algorithm. These specialized algorithms are implemented trough functors.
The question is now: how to implement a container, which is build at compile-time, where the core-algorithm can first check the size of this container ("i got 4 algorithms -> need to randomly select algorithm 0-3") and can then execute the functor in this container ("randomly chosen 2 -> process the third functor in container").
How would one implement it as simple as possible? I suppose it is possible.
Is there any connection to the curiously recurring template idiom? (wiki link)
Is there a simple way with the use of Boost::Fusion? (official doc)
Edit: All the algorithms will be used in the core-algorithm. The use pattern (random-numbers) is a runtime-decision (so i don't need compile-time-rands). The algorithm just has to know the container of functors and the size of this container for safe access.
If you want your core-algorithm to execute a specialized algorithm, there should be some kind of contract between the core-algorithm and the specialized algorithm.
If you define this contract as an interface, your container is simply a container containing pointers to these interfaces, e.g.:
class IAlgorithm
{
public:
virtual double operator()(double d) = 0;
};
typedef std::vector<IAlgorithm *> Algorithms;
Calling a random algorithm is then simply taking the size of the vector, taking a random value between zero and the size of the list (0..size-1), taking the entry at that position and calling the interface.
Alternatively, you can also use the new C++0x std::function construction, like this:
#include <functional>
typedef std::function<double(double)> Algorithm;
typedef std::vector<Algorithm> Algorithms;
Taking an algorithm is similar, you should be able to call an algorithm like this:
Algorithms myAlgorithms;
...
double myresult = myAlgorithms[2](mydouble);
This approach has the advantage that you can also use lambda's.
EDIT: This is an example that uses lambda's. It compiles and works as expected with Visual Studio 2010 (just tested this myself):
#include <iostream>
#include <vector>
#include <functional>
typedef std::function<double(double)> Algorithm;
typedef std::vector<Algorithm> Algorithms;
int main()
{
Algorithms algorithms;
algorithms.push_back([](double d)->double{return d+d;});
algorithms.push_back([](double d)->double{return d*d;});
std::cout << algorithms[0](5) << std::endl;
std::cout << algorithms[1](5) << std::endl;
}
I'm not a specialist but I think that indeed boost::fusion and/or boost::mpl are the tools you're looking for.
Your class would take an mpl container as parameter, being the list of algorithms functor types, and then would work with it at compile time.
I think an interesting subproblem is how to generate random numbers at compile-time.
Perhaps something like this :)
//compiletime_rand.h
#ifndef COMPILETIME_RAND_GENERATOR_H
#define COMPILETIME_RAND_GENERATOR_H
template <unsigned N, unsigned Seed, unsigned Modulo>
struct rand_c_impl
{
static const unsigned value_impl = (1664525 * rand_c_impl<N - 1, Seed, Modulo>::value + 1013904223) % (1ull << 32);
static const unsigned value = value_impl % Modulo;
};
template <unsigned Seed, unsigned Modulo>
struct rand_c_impl<0, Seed, Modulo>
{
static const unsigned value_impl = Seed;
static const unsigned value = value_impl;
};
#endif
//next_c_rand.h
#include BOOST_PP_UPDATE_COUNTER()
rand_c_impl<BOOST_PP_COUNTER, 0, MAX_C_RAND>::value
//main.cpp
#include <boost/preprocessor/slot/counter.hpp>
#include "compiletime_rand.h"
#include <iostream>
#define MAX_C_RAND 16
template <unsigned N>
void output_compiletime_value()
{
std::cout << N << '\n';
}
int main()
{
output_compiletime_value<
#include "next_c_rand.h"
>();
output_compiletime_value<
#include "next_c_rand.h"
>();
}
Output: 15 2