Why is `boost::hana::range_c` not a Sequence? - c++

#include <string>
#include <utility>
#include <vector>
#include <boost/hana.hpp>
namespace hana = boost::hana;
template <typename ...T>
void indexed_T_work(T&& ...args)
{
auto indices = hana::range_c<std::size_t, 0, sizeof...(T)>;
auto types = hana::make_tuple(std::forward<T>(args)...);
hana::for_each(
hana::zip(indices, types)
, [](auto&& pair_) { /* Do index-dependent work with each `T` */ }
);
}
int main()
{
indexed_T_work(5, 13, std::vector<std::string>{}, 32.f, 42, "foo");
}
I'd like to use hana::zip on a hana::tuple and hana::range_c, but hana::range_c is not considered a Sequence, which is a requirement for hana::zip. What is the reasoning behind this decision? How can I (idiomatically) accomplish my goal while respecting that decision?

First, there are several solutions:
Solution 1
auto indices = hana::to<hana::tuple_tag>(hana::range_c<std::size_t, 0, sizeof...(T)>);
auto types = hana::make_tuple(std::forward<T>(args)...);
hana::for_each(hana::zip(indices, types), hana::fuse([](auto i, auto&& x) {
// ...
}));
Solution 2
auto indices = hana::range_c<std::size_t, 0, sizeof...(T)>;
auto types = hana::make_tuple(std::forward<T>(args)...);
hana::for_each(indices, [&](auto i) {
auto& x = types[i];
// ...
});
Solution 3
auto types = hana::make_tuple(std::forward<T>(args)...);
hana::size_c<sizeof...(T)>.times.with_index([&](auto i) {
auto& x = types[i];
// ...
});
Solution (1) has the disadvantage of making a copy of each args because zip returns a sequence of sequences, and everything in Hana is by value. Since this is probably not what you want, you should pick whichever you prefer between solutions (2) and (3), which are really equivalent.
Now, the reason why ranges do not model the Sequence concept is because that wouldn't make sense. The Sequence concept requires that we be able to create an arbitrary Sequence using the hana::make function. Hence, for any Sequence tag S, hana::make<S>(...) must create a Sequence of tag S that contains .... However, a range must contain contiguous integral_constants in some interval. Hence, if range was a Sequence, hana::make<hana::range_tag>(...) should contain whatever ... is, which breaks the invariant of a range if ... are not contiguous integral_constants. Consider for example
hana::make<hana::range_tag>(hana::int_c<8>, hana::int_c<3>,
hana::int_c<5>, hana::int_c<10>)
This should be a range containing integral_constants 8,3,5,10, which does not make sense. Another similar example showing why a range can't be a Sequence is the permutations algorithm. The permutations algorithm takes a Sequence and returns a Sequence of Sequences containing all the permutations. Clearly, since a range can only hold integral_constants, it does not make sense to try and create a range of ranges. Examples like this abound.
In other words, ranges are too specialized to model the Sequence concept. The upside of having such a specialized structure is that it's very compile-time efficient. The downside is that it's not a general-purpose container and some operations can't be done on it (like zip). However, you can totally take a range and convert it to a full-blown sequence, if you know what the tradeoff is.

Related

What does the vertical pipe | mean in the context of c++20 and ranges

There are usages of | which look more like function pipe-lining or chaining rather than a bitwise or, seen in combination with the c++20 ranges. Things like:
#include <views>
#include <vector>
template<typename T>
std::vector<T> square_vector(const std::vector<T> &some_vector) {
auto result = some_vector | std::views::transform([](T x){ return x*x; };
return {result.begin(), result.end()};
}
where clearly the | operator is not a bitwise or. Since when does it work, and on what sort of functions/objects? Are these like regular views? What are some caveats?
This sort of function chaining has been introduced with C++20 ranges, with the biggest feature allowing lazy evaluation of operation on views (more precisely, viewable ranges). This means the operation transforming the view will only act on it as it is iterated.
This semantic allows for the pipeline syntax sugar, putting in a readable way what will happen when the result is iterated. The functions this is used with are based on range adaptors, which take a view (and possibly additional arguments after it) and transform it as they are iterated (essentially returning another view).
The pipeline syntax is reserved for a special sub group of these called range adaptor closures, which only take a single view with no additional parameters. These can be either adaptors with no additional arguments, adaptors with the excess arguments bound, or the result of some library functions such as the std::views::transform in the OP. Since cpp23 you can also define these yourself). Once we have some of these, the syntax:
some_viewable_range | std::views::some_adaptor_closure | some_other_adaptor_closure
is equivalent to
some_other_adaptor_closure(std::views::some_adaptor_closure(some_viewable_range))
which will evaluate the pipeline as the returned view is iterated. Similarly,
some_vector | std::views::transform([](T x){ return x*x; });
is the same as
std::views::transform([](T x){ return x*x; })(some_vector); // The first call returns the adaptor std::views::transform(some_vector, [](T x){ return x*x; }) with the second argument bound.
but more readable.
Like any view you can iterate them directly. Since this is lazy bad things can happen such as:
template<typename T>
auto square_vector(const std::vector<T> &some_vector) {
return some_vector | std::views::transform([](T x){ return x*x; });
}
int main () {
for(auto val : square_vector(std::vector<int>{1, 2 ,3, 4, 5}))
std::cout << val << '\n';
}
by the time you get to print your val, the original vector does not exist, so the input to the chain is gone, and it goes down hill from there.
To delve further into the world of ranges and adaptors you can check https://en.cppreference.com/w/cpp/ranges, and the original library these were based on, https://ericniebler.github.io/range-v3/.

Is it possible / advisable to return a range?

I'm using the ranges library to help filer data in my classes, like this:
class MyClass
{
public:
MyClass(std::vector<int> v) : vec(v) {}
std::vector<int> getEvens() const
{
auto evens = vec | ranges::views::filter([](int i) { return ! (i % 2); });
return std::vector<int>(evens.begin(), evens.end());
}
private:
std::vector<int> vec;
};
In this case, a new vector is constructed in the getEvents() function. To save on this overhead, I'm wondering if it is possible / advisable to return the range directly from the function?
class MyClass
{
public:
using RangeReturnType = ???;
MyClass(std::vector<int> v) : vec(v) {}
RangeReturnType getEvens() const
{
auto evens = vec | ranges::views::filter([](int i) { return ! (i % 2); });
// ...
return evens;
}
private:
std::vector<int> vec;
};
If it is possible, are there any lifetime considerations that I need to take into account?
I am also interested to know if it is possible / advisable to pass a range in as an argument, or to store it as a member variable. Or is the ranges library more intended for use within the scope of a single function?
This was asked in op's comment section, but I think I will respond it in the answer section:
The Ranges library seems promising, but I'm a little apprehensive about this returning auto.
Remember that even with the addition of auto, C++ is a strongly typed language. In your case, since you are returning evens, then the return type will be the same type of evens. (technically it will be the value type of evens, but evens was a value type anyways)
In fact, you probably really don't want to type out the return type manually: std::ranges::filter_view<std::ranges::ref_view<const std::vector<int>>, MyClass::getEvens() const::<decltype([](int i) {return ! (i % 2);})>> (141 characters)
As mentioned by #Caleth in the comment, in fact, this wouldn't work either as evens was a lambda defined inside the function, and the type of two different lambdas will be different even if they were basically the same, so there's literally no way of getting the full return type here.
While there might be debates on whether to use auto or not in different cases, but I believe most people would just use auto here. Plus your evens was declared with auto too, typing the type out would just make it less readable here.
So what are my options if I want to access a subset (for instance even numbers)? Are there any other approaches I should be considering, with or without the Ranges library?
Depends on how you would access the returned data and the type of the data, you might consider returning std::vector<T*>.
views are really supposed to be viewed from start to end. While you could use views::drop and views::take to limit to a single element, it doesn't provide a subscript operator (yet).
There will also be computational differences. vector need to be computed beforehand, where views are computed while iterating. So when you do:
for(auto i : myObject.getEven())
{
std::cout << i;
}
Under the hood, it is basically doing:
for(auto i : myObject.vec)
{
if(!(i % 2)) std::cout << i;
}
Depends on the amount of data, and the complexity of computations, views might be a lot faster, or about the same as the vector method. Plus you can easily apply multiple filters on the same range without iterating through the data multiple times.
In the end, you can always store the view in a vector:
std::vector<int> vec2(evens.begin(), evens.end());
So my suggestions is, if you have the ranges library, then you should use it.
If not, then vector<T>, vector<T*>, vector<index> depending on the size and copiability of T.
There's no restrictions on the usage of components of the STL in the standard. Of course, there are best practices (eg, string_view instead of string const &).
In this case, I can foresee no problems with handling the view return type directly. That said, the best practices are yet to be decided on since the standard is so new and no compiler has a complete implementation yet.
You're fine to go with the following, in my opinion:
class MyClass
{
public:
MyClass(std::vector<int> v) : vec(std::move(v)) {}
auto getEvens() const
{
return vec | ranges::views::filter([](int i) { return ! (i % 2); });
}
private:
std::vector<int> vec;
};
As you can see here, a range is just something on which you can call begin and end. Nothing more than that.
For instance, you can use the result of begin(range), which is an iterator, to traverse the range, using the ++ operator to advance it.
In general, looking back at the concept I linked above, you can use a range whenever the conext code only requires to be able to call begin and end on it.
Whether this is advisable or enough depends on what you need to do with it. Clearly, if your intention is to pass evens to a function which expects a std::vector (for instance it's a function you cannot change, and it calls .push_back on the entity we are talking about), you clearly have to make a std::vector out of filter's output, which I'd do via
auto evens = vec | ranges::views::filter(whatever) | ranges::to_vector;
but if all the function which you pass evens to does is to loop on it, then
return vec | ranges::views::filter(whatever);
is just fine.
As regards life time considerations, a view is to a range of values what a pointer is to the pointed-to entity: if the latter is destroied, the former will be dangling, and making improper use of it will be undefined behavior. This is an erroneous program:
#include <iostream>
#include <range/v3/view/filter.hpp>
#include <string>
using namespace ranges;
using namespace ranges::views;
auto f() {
// a local vector here
std::vector<std::string> vec{"zero","one","two","three","four","five"};
// return a view on the local vecotor
return vec | filter([](auto){ return true; });
} // vec is gone ---> the view returned is dangling
int main()
{
// the following throws std::bad_alloc for me
for (auto i : f()) {
std::cout << i << std::endl;
}
}
You can use ranges::any_view as a type erasure mechanism for any range or combination of ranges.
ranges::any_view<int> getEvens() const
{
return vec | ranges::views::filter([](int i) { return ! (i % 2); });
}
I cannot see any equivalent of this in the STL ranges library; please edit the answer if you can.
EDIT: The problem with ranges::any_view is that it is very slow and inefficient. See https://github.com/ericniebler/range-v3/issues/714.
It is desirable to declare a function returning a range in a header and define it in a cpp file
for compilation firewalls (compilation speed)
stop the language server from going crazy
for better factoring of the code
However, there are complications that make it not advisable:
How to get type of a view?
If defining it in a header is fine, use auto
If performance is not a issue, I would recommend ranges::any_view
Otherwise I'd say it is not advisable.

Convert from Boost Hana Tuple to Std Vector

I am trying to create a std::vector from a boost::hana::tuple at compile-time like so:
boost::hana::tuple<std::string> namesString{ "Hello", "World" };
std::vector<std::string> namesStringVector{};
while(!(hana::is_empty(namesString)))
{
namesStringVector.emplace_back(hana::front(namesString));
hana::drop_front(namesString);
}
This clearly doesn't work because the while loop is not run at compile-time.
How do we achieve this effect in Boost::Hana? I.e. what compile-time Hana construct would allow us to perform this cast? I tried doing
namesStringVector = (std::vector<std::string>)namesString;
and
hana::to < std::vector < std::string > >(namesString);
But it tells me there does not exist such a cast in both cases.
In addition to the concerns that Louis addressed there are some other issues with how you are trying to use a tuple. Note that a tuple can not have its length or the types within it changed as types are completely immutable in C++ so you have to think in terms of pure functional programming.
Your call to is_empty will always return true, and drop_front can not change the input value, it only returns a new tuple with the front element removed. (in your case it would be tuple<> which is empty).
You might want to rethink your use case for wanting to convert a tuple to vector, but here is an example to hopefully get you started.
#include <boost/hana.hpp>
#include <iostream>
#include <string>
#include <vector>
namespace hana = boost::hana;
template <typename T>
constexpr auto to_vector = [](auto&& ...x) {
return std::vector<T>{std::forward<decltype(x)>(x)...};
};
int main() {
auto xs = hana::make_tuple("Hello", "World");
auto vec = hana::unpack(xs, to_vector<std::string>);
for (auto const& x : vec) {
std::cout << x << ' ';
}
std::cout << '\n';
}
Notes about list types:
std::vector has a run-time length and a single type for all
elements.
std::array has a compile-time length and a single type for all
elements.
hana::tuple has a compile-time length and any element can be any
type.
For starters, boost::hana::tuple<std::string> namesString{ "Hello", "World" }; doesn't make sense because your tuple only has one element but you try to initialize it with two.
Second, it doesn't really make sense to initialize a std::vector from a hana::tuple, since it implies that all the elements of the tuple have the same type. Instead, you should probably be using std::array. Hana is useful when you need heterogeneity (elements with different types), which you don't seem to need here. When you don't need heterogeneity, using std::array or similar tools will be much easier and natural than using Hana.

How to construct a tuple from an array

I am designing a C++ library that reads a CSV file of reported data from some experiment and does some aggregation and outputs a pgfplots code. I want to make the library as generic and easy to use as possible. I also want to isolate it from the data types that are represented in the CSV file and leave the option to user to parse each column as she desires. I also want to avoid Boost Spirit Qi or other heavy duty parser.
The simple solution I have is for the user to create a type for each column, with a constructor that takes "char *". The constructor does its own parsing for the value it is given, which is one cell from the data. The user then passes me a list of types; the schema, representing the types in a line of data. I use this type list to create a tuple, in which every member of the tuple is responsible for parsing itself.
The problem now is how to initialise (construct) this tuple. Dealing with tuples is of course not straightforward since iterating over their elements is mostly a compile-time operation. I used Boost Fusion at first to achieve this task. However, the function I used (transform) although might take a tuple as input (with the appropriate adapter), it does not seem to return a tuple. I need the return value to be a tuple so some other code can use it as an associative type-to-value container (access it by type via std::get<T>), while using only standard tools, that is, without using Boost. So I had to convert whatever Fusion's transform returned into std::tuple.
My question is how to avoid this conversion, and better yet how to avoid Boost Fusion completely.
A simple solution that comes to mind is to use the constructor of std::tuple, and somehow pass each element its respective "const *" that it needs to construct. However, while this is possible using some complicated template-based enumeration techniques, I am wondering if there is a straightforward "parameter-pack"-like approach, or an even simpler way to pass the values to the constructors of the individual elements of a tuple.
To clarify what I am seeking, kindly take a look at this following code.
#include <cstdio>
#include <array>
template <typename...> struct format {};
template <typename...> struct file_loader {};
template <typename... Format>
struct
file_loader<format<Format...> > {
void load_file() {
size_t strsize = 500u;
char *str = new char[strsize]();
auto is = fopen("RESULT","r");
/* example of RESULT:
dataset2,0.1004,524288
dataset1,0.3253,4194304
*/
while(getline(&str, &strsize, is) >= 0) {
std::array<char*, 3> toks{};
auto s = str;
int i = 2;
while(i --> 0)
toks[i] = strsep (&s, ",");
toks[2] = strsep (&s, ",\n");
std::tuple<Format...> the_line{ /* toks */ } ; // <-- HERE
//// current solution:
// auto the_line{
// as_std_tuple( // <-- unnecessary conversion I'd like to avoid
// boost::fusion::transform(boost::fusion::zip(types, toks), boost::fusion::make_fused( CAST() ))
// )};
// do something with the_line
}
}
};
#include <string>
class double_type {
public:
double_type() {}
double_type(char const *token) { } // strtod
};
class int_type {
public:
int_type() {}
int_type(char const *token) { } // strtoul
};
int main(int argc, char *argv[]) {
file_loader< format< std::string,
double_type,
int_type > >
{}.load_file();
return 0;
}
I've highlighted the interesting line as "HERE" in a comment.
My question precisely is:
Is there a way to construct a std::tuple instance (of heterogeneous
types, each of which is implicitly convertible from "char *") with
automatic storage duration (on the stack) from a std::array<char *, N>,
where N equals the size of that tuple?
The answer I am seeking should
Avoid Boost Fusion
(Simplicity condition) Avoid using more than 5 lines of boilerplate template-based enumeration code
Alternatively, shows why this is not possible to do in the C++14 standard
The answer can use C++17 constructs, I wouldn't mind.
Thank you,
As with all questions involving std::tuple, use index_sequence to give you a parameter pack to index the array with:
template <class... Formats, size_t N, size_t... Is>
std::tuple<Formats...> as_tuple(std::array<char*, N> const& arr,
std::index_sequence<Is...>)
{
return std::make_tuple(Formats{arr[Is]}...);
}
template <class... Formats, size_t N,
class = std::enable_if_t<(N == sizeof...(Formats))>>
std::tuple<Formats...> as_tuple(std::array<char*, N> const& arr)
{
return as_tuple<Formats...>(arr, std::make_index_sequence<N>{});
}
Which you would use as:
std::tuple<Format...> the_line = as_tuple<Format...>(toks);

Boost R-tree : counting elements satisfying a query

So far, when I want to count how many elements in my R-tree satisfy a specific spatial query, it boils down to running the query, collecting the matches and then counting them, roughly as follow:
std::vector<my_type> results;
rtree_ptr->query(bgi::intersects(query_box), std::back_inserter(results));
int nbElements = results.size();
Is there a better way, i.e. a way to directly count without retrieving the actual elements? I haven't found anything to do that but who knows. (I'm building my tree with the packing algorithm, in case it has any relevance.)
My motivation is that I noticed that the speed of my queries depend on the number of matches. If there are 0 matches, the query is more or less instantaneous ; if there are 10 000 matches, it takes several seconds. Since it's possible to determine very fast whether there are any matches, it seems that traversing the tree is extremely fast (at least in the index I made) ; it is collecting all the results that makes the queries slower in case of many matches. Since I'm not interested in collecting but simply counting (at least for some queries), it would be awesome if I could just skip the collecting.
I had a late brainwave. Even better than using function_output_iterator could be using the boost::geometry::index query_iterators.
In principle, it will lead to exactly the same behaviour with slightly simpler code:
box query_box;
auto r = boost::make_iterator_range(bgi::qbegin(tree, bgi::intersects(query_box)), {});
// in c++03, spell out the end iterator: bgi::qend(tree)
size_t nbElements = boost::distance(r);
NOTE: size() is not available because the query_const_iterators are not of the random-access category.
But it may be slightly more comfortable to combine. Say, if you wanted an additional check per item, you'd use standard library algorithms like:
size_t matching = std::count_if(r.begin(), r.end(), some_predicate);
I think the range-based solution is somewhat more flexible (the same code can be used to achieve other algorithms like partial_sort_copy or std::transform which would be hard to fit into the output-iterator idiom from my earlier answer).
You can use a function output iterator:
size_t cardinality = 0; // number of matches in set
auto count_only = boost::make_function_output_iterator([&cardinality] (Tree::value_type const&) { ++cardinality; });
Use it like this:
C++11 using a lambda
Live On Coliru
#include <boost/function_output_iterator.hpp>
#include <boost/geometry/geometries/box.hpp>
#include <boost/geometry/geometries/point_xy.hpp>
#include <boost/geometry/core/cs.hpp>
#include <boost/geometry/index/rtree.hpp>
namespace bgi = boost::geometry::index;
using point = boost::geometry::model::d2::point_xy<int, boost::geometry::cs::cartesian>;
using box = boost::geometry::model::box<point>;
int main()
{
using Tree = bgi::rtree<box, bgi::rstar<32> >;
Tree tree;
size_t cardinality = 0; // number of matches in set
auto count_only = boost::make_function_output_iterator([&cardinality] (Tree::value_type const&) { ++cardinality; });
box query_box;
tree.query(bgi::intersects(query_box), count_only);
int nbElements = cardinality;
return nbElements;
}
C++03 using a function object
For C++ you can replace the lambda with a (polymorphic!) function object:
struct count_only_f {
count_only_f(size_t& card) : _cardinality(&card) { }
template <typename X>
void operator()(X) const {
++(*_cardinality);
}
private:
size_t *_cardinality;
};
// .... later:
boost::function_output_iterator<count_only_f> count_only(cardinality);
C++03 using Boost Phoenix
I would consider this a good place to use Boost Phoenix:
#include <boost/phoenix.hpp>
// ...
size_t cardinality = 0; // number of matches in set
tree.query(bgi::intersects(query_box), boost::make_function_output_iterator(++boost::phoenix::ref(cardinality)));
Or, more typically with namespace aliases:
#include <boost/phoenix.hpp>
// ...
size_t cardinality = 0; // number of matches in set
tree.query(bgi::intersects(query_box), make_function_output_iterator(++phx::ref(cardinality)));