Why do range algorithms have projection argument after comparator argument? - c++

C++20 range algorithms support projections, and obviously like STL algorithms they support custom comparators.
But what I found puzzling is the order of the projection and the comparator.
My "problem"(more of a annoyance, I will quickly learn to use this order of arguments) is that it breaks the left to right flow of code.
Consider the following code (apologies for it not being short, but that is kind of the point, to show that in realistic code order of arguments makes code harder to read when your variables names
are not 2 letter long):
struct PointyPoint {
int x;
int y;
};
struct Item {
std::string name;
PointyPoint location;
};
//...
std::ranges::sort(
items,
[](const PointyPoint &a, const PointyPoint &b) {
return std::tie(a.x, a.y) < std::tie(b.x, b.y);
},
&Item::location);
Issue I have with this code is that I think it would look much nicer if projection was before the lambda (comparator).
Full code godbolt.
Reasons why I can think this order was picked:
STL algorithms have comparator usually as 3rd argument(first after the iterators) so it is to match that
maybe custom comparators are much more common that projection, so to avoid need for supplying that argument

I believe you have already mentioned the reasoning behing this yourself, but let me harden your points:
Porting pre-ranges code to <ranges> should be straightforward. Imagine you have an existing
std::sort(data.begin(), data.end(), std::greater<>{});
and would like to turn that into a <ranges> algorithm call. Imagine you had to go with
std::ranges::sort(data, std::identity{}, std::greater<>{});
This does compose a burden that hinders easy migration. With the actual ordering, you can just change the first version to
std::ranges::sort(data, std::greater<>{});
Using the standard comparison is very low friction, so if you need a projection, it's easy to prepend a std::less<>{}. Admittedly, you could say the same thing about std::identity{}, but that doesn't rule out the first point.
Last, the rational in this document might also be helpful on that subject:
For algorithms that optionally accept functions/predicates (e.g. transform, sort), projection arguments follow functions/predicates. There are no algorithm overloads that allow the user to specify the projection without also specifying a predicate, even if the default would suffice. This is to reduce the number of overloads and also to avoid any potential for ambiguity.

Related

Why do C++20 ranges not provide only pipe syntax?

I understand that question sounds weird, so here is a bit of context.
Recently I was disappointed to learn that map reduce in C++20 ranges does not work as one would expect i.e.
const double val = data | transform(...) | accumulate (...);
does not work, you must write it this unnatural way:
const double val = accumulate(data | transform(...));
Details can be found here and here, but it boils down to the fact that accumulate can not disambiguate between 2 different usecases.
So this got me thinking:
If C++20 required that you must use pipe for using ranges, aka you can not write
vector<int> v;
sort(v);
but you must write
vector<int> v
v|sort();
would that would solve problem of ambiguity?
And if so although unnatural to people using std::sort and other STL algorithms I wonder if in the long run that would be a better design choice.
Note:
If this question is too vague feel free to vote to close, but I feel that this is a legitimate design question that can be answered in relatively unbiased way, especially if my understanding of the problem is wrong.
You need to differentiate between range algorithms and range adaptors. Algorithms are functions that perform a generic operation on a range of values. Adaptors are functions which create range views that modify the presentation of a range. Adaptors are chained by the | operator; algorithms are just regular functions.
Sometimes, the same conceptual thing can have an algorithm and adapter form. transform exists as both an algorithm and an adapter. The former stores the transformation into an output range; the latter creates a view range of the input that lazily computes the transformation as requested.
These are different tasks for different needs and uses.
Also, note that there is no sort adapter in C++20. A sort adapter would have to create a view range that somehow mixed around the elements in the source range. It would have to allocate storage for the new sequence of values (even if it's just sorting iterators/pointers/indices to the values). And the sorting would have to be done at construction time, so there would be no lazy operation taking place.
This is also why accumulate doesn't work that way. It's not a matter of "ambiguity"; it's a matter of the fundamental nature of the operation. Accumulation computes a value from a range; it does not compute a new range from an existing one. That's the work of an algorithm, not an adapter.
Some tasks are useful in algorithm form. Some tasks are useful in adapter form (you find very few zip-like algorithms). Some tasks are useful in both. But because these are two separate concepts for different purposes, they have different ways of invoking them.
would that would solve problem of ambiguity?
Yes.
If there's only one way to write something, that one way must be the only possible interpretation. If an algorithm "call" can only ever be a partial call to the algorithm that must be completed with a | operation with a range on the left hand side, then you'd never even have the question of if the algorithm call is partial or total. It's just always partial.
No ambiguity in that sense.
But if you went that route though, you end up with things like:
auto sum = accumulate("hello"s);
Which doesn't actually sum the chars in that string and actually is placeholder that is waiting on a range to accumulate over with the initial value "hello"s.

Intel MKL OOP Wrapper Design and Operator Overloading

I started writing an oop wrapper for Intels MKL library and came across some design issues. I hope you can help me find the "best" way to handle these issues. The issues are mainly concerning operator overloading and are not critical two the wrapper but effect readability and/or performance.
The first issue is overloading operators considering how the blas functions are defined. As an example, matrix multiplication is defined as
( being matrices, scalars).
Now i can overload , and alone, but for the implementation of BLAS I would need 4 function calls using overloaded operators instead of one. Or i could use a normal function call (which will be implemented anyway), but lose the "natural" way of writing the equation using overloaded operators, making it less readable (but still more readible than with those horrible BLAS names).
The second issue is read and write access to the matrices. As example we can consider the following upper triangular matrix:
This matrix would be stored efficiently in a 1D array like this (order may vary depending on row/column major order):
Since a matrix has two indices, the easiest way to overload reading would be using
<TYPE> & operator() (size_t row, size_t column);
instead of some work around with subscript operators. The problem is handling the zeros. They may not be stored in the array, but mathematically they exist. If I want to read these values in another function (not MKL) I may need to be able to return the zero to handle this (aside from storing the matrix type, which is done for BLAS anyway).
Since () returns a reference, I can't return 0. I could make a dummy variable, but if I were to write to that value, I wouldn't have a upper triangular matrix anymore. So I would have to either change the matrix type, forbid writing to these elements, or ignore it (bad idea).
To change the matrix type I would need to detect writing, that would require explicitly using some kind of proxy object.
To prevent writing, I would probably have to do the same since I can't return a const value because the overload doesn't fit that definition. Alternatively I could forbid writing this way in general, but then I couldn't change the existing matrix itself, which I don't want.
I hope you can give me some pointers on how to handle these issues and what design principles I may be forgetting/should take into account. As I said, they are not critical (I can write appropriate functions for everything instead of operators).
T
I wrote a library for medical image reconstruction https://github.com/kvahed/codeare. The matrix object there has a lot of overloaded operators and convenience function to allow one to write efficiently matlab-like code in c++.
What you want to do for passing the data between MKL and other libraries / algorithms is in my view impossible. How do you want to distinguish 0 from 1e-18. What when you want to go to some numeric optimisation etc. This is premature optimisation that you are looking at. Even if you wanted to use sparsity, you could only do it say column-wise or row-wise, or like above note down, that you have an upper triangular form. But skipping individual 0s. Crazy. Of course copying 0s around doesn't feel right, but getting your algorithms optimised first and then worry about the above would be the way I'd go.
Also don't forget, that a lot of libraries out there cannot handle sparse matrixes, at which point you would have to put in place a recopying of the non-zero part or some bad ass expensive iterator, that would deliver the results.
Btw you would not only need the operator you noted down in your question but also the const variant; in other words:
template <typename T> class Matrix {
...
T& operator()(size_t n, size_t m);
const T& operator()(size_t n, size_t m) const;
...
};
There is so much more expensive stuff to optimise than std::copying stuff. For example SIMD intrinsics ...
https://github.com/kvahed/codeare/blob/master/src/matrix/SIMDTraits.hpp

GLM + STL: operator == missing

I try to use GLM vector classes in STL containers. No big deal as long as I don't try to use <algorithm>. Most algorithms rely on the == operator which is not implemented for GLM classes.
Anyone knows an easy way to work around this? Without (re-)implementing STL algorithms :(
GLM is a great math library implementing GLSL functions in c++
Update
I just found out that glm actually implements comparison operators in an extension (here). But how do i use them in stl?
Update 2
This question has been superseded by this one: how to use glm's operator== in stl algorithms?
Many STL algorithms accept a functor for object comparison (of course, you need to exercise special care when comparing two vectors containing floating point values for equality).
Example:
To sort a std::list<glm::vec3> (it's up to you whether sorting vectors that way would make any practical sense), you could use
std::sort(myVec3List.begin(), myVec3List.end(), MyVec3ComparisonFunc)
with
bool MyVec3ComparisonFunc(const glm::vec3 &vecA, const glm::vec3 &vecB)
{
return vecA[0]<vecB[0]
&& vecA[1]<vecB[1]
&& vecA[2]<vecB[2];
}
So, thankfully, there is no need to modify GLM or even reinvent the wheel.
You should be able to implement a operator== as a stand-alone function:
// (Actually more Greg S's code than mine.....)
bool operator==(const glm::vec3 &vecA, const glm::vec3 &vecB)
{
const double epsilion = 0.0001; // choose something apprpriate.
return fabs(vecA[0] -vecB[0]) < epsilion
&& fabs(vecA[1] -vecB[1]) < epsilion
&& fabs(vecA[2] -vecB[2]) < epsilion;
}
James Curran and Greg S have already shown you the two major approaches to solving the problem.
define a functor to be used explicitly in the STL algorithms that need it, or
define the actual operators == and < which STL algorithms use if no functor is specified.
Both solutions are perfectly fine and idiomatic, but a thing to remember when defining operators is that they effectively extend the type. Once you've defined operator< for a glm::vec3, these vectors are extended to define a "less than" relationship, which means that any time someone wants to test if one vector is "less than" another, they'll use your operator. So operators should only be used if they're universally applicable. If this is always the one and only way to define a less than relationship between 3D vectors, go ahead and make it an operator.
The problem is, it probably isn't. We could order vectors in several different ways, and none of them is obviously the "right one". For example, you might order vectors by length. Or by magnitude of the x component specifically, ignoring the y and z ones. Or you could define some relationship using all three components (say, if a.x == b.x, check the y coordinates. If those are equal, check the z coordinates)
There is no obvious way to define whether one vector is "less than" another, so an operator is probably a bad way to go.
For equality, an operator might work better. We do have a single definition of equality for vectors: two vectors are equal if every component is equal.
The only problem here is that the vectors consist of floating point values, and so you may want to do some kind of epsilon comparison so they're equal if all members are nearly equal. But then the you may also want the epsilon to be variable, and that can't be done in operator==, as it only takes two parameters.
Of course, operator== could just use some kind of default epsilon value, and functors could be defined for comparisons with variable epsilons.
There's no clear cut answer on which to prefer. Both techniques are valid. Just pick the one that best fits your needs.

Chaining iterators for C++

Python's itertools implement a chain iterator which essentially concatenates a number of different iterators to provide everything from single iterator.
Is there something similar in C++ ? A quick look at the boost libraries didn't reveal something similar, which is quite surprising to me. Is it difficult to implement this functionality?
Came across this question while investigating for a similar problem.
Even if the question is old, now in the time of C++ 11 and boost 1.54 it is pretty easy to do using the Boost.Range library. It features a join-function, which can join two ranges into a single one. Here you might incur performance penalties, as the lowest common range concept (i.e. Single Pass Range or Forward Range etc.) is used as new range's category and during the iteration the iterator might be checked if it needs to jump over to the new range, but your code can be easily written like:
#include <boost/range/join.hpp>
#include <iostream>
#include <vector>
#include <deque>
int main()
{
std::deque<int> deq = {0,1,2,3,4};
std::vector<int> vec = {5,6,7,8,9};
for(auto i : boost::join(deq,vec))
std::cout << "i is: " << i << std::endl;
return 0;
}
In C++, an iterator usually doesn't makes sense outside of a context of the begin and end of a range. The iterator itself doesn't know where the start and the end are. So in order to do something like this, you instead need to chain together ranges of iterators - range is a (start, end) pair of iterators.
Takes a look at the boost::range documentation. It may provide tools for constructing a chain of ranges. The one difference is that they will have to be the same type and return the same type of iterator. It may further be possible to make this further generic to chain together different types of ranges with something like any_iterator, but maybe not.
I've written one before (actually, just to chain two pairs of iterators together). It's not that hard, especially if you use boost's iterator_facade.
Making an input iterator (which is effectively what Python's chain does) is an easy first step. Finding the correct category for an iterator chaining a combination of different iterator categories is left as an exercise for the reader ;-).
Check Views Template Library (VTL). It may not provided 'chained iterator' directly. But I think it has all the necessary tools/templates available for implementing your own 'chained iterator'.
From the VTL Page:
A view is a container adaptor, that provides a container interface to
parts of the data or
a rearrangement of the data or
transformed data or
a suitable combination of the data sets
of the underlying container(s). Since views themselves provide the container interface, they can be easily combined and stacked. Because of template trickery, views can adapt their interface to the underlying container(s). More sophisticated template trickery makes this powerful feature easy to use.
Compared with smart iterators, views are just smart iterator factories.
What you are essentially looking for is a facade iterator that abstracts away the traversing through several sequences.
Since you are coming from a python background I'll assume that you care more about flexibility rather than speed. By flexibility I mean the ability to chain-iterate through different sequence types together (vector, array, linked list, set etc....) and by speed I mean only allocating memory from the stack.
If this is the case then you may want to look at the any_iterator from adobe labs:
http://stlab.adobe.com/classadobe_1_1any__iterator.html
This iterator will give you the ability to iterate through any sequence type at runtime. To chain you would have a vector (or array) of 3-tuple any_iterators, that is, three any_iterators for each range you chain together (you need three to iterate forward or backward, if you just want to iterate forward two will suffice).
Let's say that you wanted to chain-iterate through a sequence of integers:
(Untested psuedo-c++ code)
typedef adobe::any_iterator AnyIntIter;
struct AnyRange {
AnyIntIter begin;
AnyIntIter curr;
AnyIntIter end;
};
You could define a range such as:
int int_array[] = {1, 2, 3, 4};
AnyRange sequence_0 = {int_array, int_array, int_array + ARRAYSIZE(int_array)};
Your RangeIterator class would then have an std::vector.
<code>
class RangeIterator {
public:
RangeIterator() : curr_range_index(0) {}
template <typename Container>
void AddAnyRange(Container& c) {
AnyRange any_range = { c.begin(), c.begin(), c.end() };
ranges.push_back(any_range);
}
// Here's what the operator++() looks like, everything else omitted.
int operator++() {
while (true) {
if (curr_range_index > ranges.size()) {
assert(false, "iterated too far");
return 0;
}
AnyRange* any_range = ranges[curr_range_index];
if (curr_range->curr != curr_range->end()) {
++(curr_range->curr);
return *(curr_range->curr);
}
++curr_range_index;
}
}
private:
std::vector<AnyRange> ranges;
int curr_range_index;
};
</code>
I do want to note however that this solution is very slow. The better, more C++ like approach is just to store all the pointers to the objects that you want operate on and iterate through that. Alternatively, you can apply a functor or a visitor to your ranges.
Not in the standard library. Boost might have something.
But really, such a thing should be trivial to implement. Just make yourself an iterator with a vector of iterators as a member. Some very simple code for operator++, and you're there.
No functionality exists in boost that implements this, to the best of my knowledge - I did a pretty extensive search.
I thought I'd implement this easily last week, but I ran into a snag: the STL that comes with Visual Studio 2008, when range checking is on, doesn't allow comparing iterators from different containers (i.e., you can't compare somevec1.end() with somevec2.end() ). All of a sudden it became much harder to implement this and I haven't quite decided yet on how to do it.
I wrote other iterators in the past using iterator_facade and iterator_adapter from boost, which are better than writing 'raw' iterators but I still find writing custom iterators in C++ rather messy.
If someone can post some pseudocode on how this could be done /without/ comparing iterators from different containers, I'd be much obliged.

Aggregating contributions from multiple donors

As I try to modernize my C++ skills, I keep encountering this situation where "the STL way" isn't obvious to me.
I have an object that wants to gather contributions from multiple sources into a container (typically a std::vector). Each source is an object, and each of those objects provides a method get_contributions() that returns any number of contributions (from 0 to many). The gatherer will call get_contributions() on each contributor and aggregate the results into a single collection.
The question is, what's the best signature for get_contributions()?
Option 1: std::vector<contribution> get_contributions() const
This is the most straightforward, but it leads to lots of copying as the gatherer copies each set of results into the master collection. And yes, performance matters here. For example, if the contributors were geometric models and getting contributions amounted to tesselating them into triangles for rendering, then speed would count and the number of contributions could be enormous.
Option 2: template <typename container> void get_contributions(container &target) const
This allows each contributor to add its contributions directly to the master container by calling target.push_back(foo). The drawback here is that we're exposing the container to other types of inspection and manipulation. I'd prefer to keep the interface as narrow as possible.
Option 3: template <typename out_it> void get_contributions(out_it &it) const
In this solution, the aggregator would pass a std::back_insert_iterator for the master collection, and the individual contributors would do *it++ = foo; for each contribution. This is the best I've come up with so far, but I'm left with the feeling that there must be a more elegant way. The back_insert_iterator feels like a kludge.
Is Option 3 the best, or is there a better approach? Does this gathering pattern have a name?
There's a fourth, that would require you to define you iterator ranges. Check out Alexandrescu's presentation on "Iterators must go".
Option 3 is the most idiomatic way. Note that you don't have to use back_insert_iterator. If you know how many elements are going to be added, you can resize the vector, and then provide a regular vector iterator instead. It won't call push_back then (and potentially save you some copying)
back_insert_iterator's main advantage is that it expands the vector as needed.
It's not a kludge though. It's designed for this exact purpose.
One minor adjustment would be to take pass the iterator by value, and then return it when the function returns.
I would say there are two idiomatic STL ways: your Option 3 (taking an output iterator, which you'd pass by value, by the way) and taking a functor which you would call with each of the contributions.
Each of these is only appropriate if it is suitable to implement get_contributions as a template, of course.