How to use `boost::range` iterators with standard iterators - c++

I have functions that take in std::vector iterators, as in
typedef std::vector<Point> Points;
Points ConvexHull(Points::const_iterator first, Points::const_iterator last);
I usually pass the std iterators to them, but occasionally I need to work with boost iterators, such as boost::join's range iterator. How should I change the parametrizations of my functions, ideally without templates, so that they accept both iterators? Moreover, how do I indicate in each type which iterator concepts I need?
I tried looking at the boost::range documentation but it's overwhelmingly confusing for me and I don't know where to start.
For example, I couldn't find the difference between boost::range_details::any_forward_iterator_interface and boost::range_details::any_forward_iterator_wrapper, and whether I should use either of those to specify that I need a forward iterator.
Edit:
If I use boost::any_range, how can I pass non-const lvalue references?
For example:
template<typename T>
using Range = boost::any_range<T, boost::random_access_traversal_tag,
T, std::ptrdiff_t>;
f(Range<Point> &points); // defined elsewhere
// -------------
vector<Point> vec;
f(vec); // error; cannot bind non-const lvalue reference to unrelated type

boost-range has the any_range for this purpose and it suits both purposes for your case.
https://www.boost.org/doc/libs/1_60_0/libs/range/doc/html/range/reference/ranges/any_range.html
From your example it would look like this:
#include <boost/range/any_range.hpp>
typedef boost::any_range<Point,
boost::bidirectional_traversal_tag,
Point,
std::ptrdiff_t
> PointRange;

You should strongly consider using a template. Doing so let's the compiler keep useful information about what operations are actually occurring, which greatly helps it generate optimised output. The std:: convention is to name the type parameter for the concept required. E.g.
template< class BidirIt, class UnaryPredicate > // anything bidirectional (which includes random access)
BidirIt std::partition( BidirIt first, BidirIt last, UnaryPredicate p );
If you really don't want a template, you still shouldn't name anything in a detail namespace. Something like
#include <boost/range/any_range.hpp>
using PointRange = boost::any_range<Point, boost::random_access_traversal_tag>; // or another traversal tag.
using PointIterator = PointRange::iterator;
You will likely need to pass PointRange & less frequently than, say, int *&. Almost always passing by value is the correct behaviour. It is cheap to copy, as it holds a begin and end iterator from the Range that it was constructed from, nothing more.

Related

Const correctness in generic functions using iterators

I want to write a generic functions that takes in a sequence, while guaranteeing to not alter said sequence.
template<typename ConstInputIter, typename OutputIter>
OutputIter f(ConstInputIter begin, ConstInputIter end, OutputIter out)
{
InputIter iter = begin;
do
{
*out++ = some_operation(*iter);
}while(iter!=end);
return out;
}
Yet the above example still would take any type as ConstInputIterator, not just const ones. So far, the notion towards being const in it is nominal.
How do I declare the sequence given will not be altered by this function?
Even in C++20, there is no generic way to coerce an iterator over a non-const T into an iterator over a T const. Particular iterators may have a mechanism to do that, and you can use std::cbegin/cend for ranges to get const iterators. But given only an iterator, you are at the mercy of what the user provides.
Applying a C++20 constraint (requiring iter_value_t to be const) is the wrong thing, as your function should be able to operate on a non-const range.
Since c++17, you can use std::as_const every time you deference your iterator:
// ...
*out++ = some_operation(std::as_const(*iter));
// ...
This makes sure you never access the elements of the underlying container through a non-const l-value reference.
Note: if you don't have access to c++17, it's pretty trivial to implement your own version of std::as_const. Just make sure you declare it outside of namespace std.

Why can't I use istream_view and std::accumulate to sum up my input?

This program:
#include <ranges>
#include <numeric>
#include <iostream>
int main() {
auto rng = std::ranges::istream_view<int>(std::cin);
std::cout << std::accumulate(std::ranges::begin(rng), std::ranges::end(rng), 0);
}
is supposed to sum up all integers appearing as text on the standard input stream. But - it doesn't compile. I know std::ranges::begin() and std::ranges::end() exist, so what's going on? The compiler tells me it can't find a suitable candidate; why?
From inception up through C++17, everything in <algorithm> is based on iterator pairs: you have one iterator referring to the beginning of a range and one iterator referring to the end of the range, always having the same type.
In C++20, this was generalized. A range is now denoted by an iterator and a sentinel for that iterator - where the sentinel itself need not actually be an iterator of any kind, it just needs to be a type that can compare equal to its corresponding iterator (this is the sentinel_for concept).
C++17 ranges tend to be† valid C++20 ranges, but not necessarily in the opposite direction. One reason is the ability to have a distinct sentinel type, but there are others, which also play into this question (see below).
To go along with the new model, C++20 added a large amount of algorithms into the std::ranges namespace that take an iterator and a sentinel, rather than two iterators. So for instance, while we've always had:
template<class InputIterator, class T>
constexpr InputIterator find(InputIterator first, InputIterator last,
const T& value);
we now also have:
namespace ranges {
template<input_­iterator I, sentinel_­for<I> S, class T, class Proj = identity>
requires indirect_­binary_­predicate<ranges::equal_to, projected<I, Proj>, const T*>
constexpr I find(I first, S last, const T& value, Proj proj = {});
template<input_­range R, class T, class Proj = identity>
requires indirect_­binary_­predicate<ranges::equal_to,
projected<iterator_t<R>, Proj>, const T*>
constexpr borrowed_iterator_t<R>
find(R&& r, const T& value, Proj proj = {});
}
The first overload here takes an iterator/sentinel pair and the second takes a range instead.
While a lot of algorithms added corresponding overloads into std::ranges, the ones in <numeric> were left out. There is a std::accumulate but there is no std::ranges::accumulate. As such, the only version we have available at the moment is one that takes an iterator-pair. Otherwise, you could just write:
auto rng = std::ranges::istream_view<int>(std::cin);
std::cout << std::ranges::accumulate(rng, 0);
Unfortunately, std::ranges::istream_view is one of the new, C++20 ranges whose sentinel type differs from its iterator type, so you cannot pass rng.begin() and rng.end() into std::accumulate either.
This leaves you with two options generally (three, if you include waiting for C++23, which will hopefully have a std::ranges::fold):
Write your own range-based and iterator-sentinel-based algorithms. Which for fold is very easy to do.
Or
There is a utility to wrap a C++20 range into a C++17-compatible one: views::common. So you could this:
auto rng = std::ranges::istream_view<int>(ints) | std::views::common;
std::cout << std::accumulate(rng.begin(), rng.end(), 0);
Except not in this specific case.
istream_view's iterators aren't copyable, and in C++17 all iterators must be. So there isn't really a way to provide C++17-compatible iterators based on istream_view. You need proper C++20-range support. The future std::ranges::fold will support move-only views and move-only iterators, but std::accumulate never can.
Which in this case, just leaves option 1.
†A C++20 iterator needs to be default-constructible, which was not a requirement of C++17 iterators. So a C++17 range with non-default-constructible iterators would not be a valid C++20 range.
The problem is that the end of a C++ range is not, in the general case, an iterator, but rather, a sentinel. A sentinel can have a different type than an iterator and admit fewer operations - as, generally speaking, you mostly need to compare against it to know you've reached the end of the range, and may not be allowed to just work with it like any iterator. For more about this distinction, read:
What's the difference between a sentinel and an end iterator?
Now, standard-library algorithms (including the ones in <numeric>) take pairs of iterators of the same type. In your example:
template< class InputIt, class T >
constexpr T accumulate( InputIt first, InputIt last, T init );
see? InputIt must be the type of both the beginning and end of the range. This can (probably) not even be "fixed" for the istream_view range, because the end of the standard input really isn't an iterator per se. (Although maybe you could bludgeon it into being an iterator and throw exceptions when doing irrelevant things with it.)
So, we would need either a new variant of std::accumulate or an std::ranges::accumulate, which we currently don't have. Or, of course, you could write that yourself, which should not be too difficult.
Edit: One last option, suggested by #RemyLebeau, is to use an std::istream_iterator instead:
std::cout << std::accumulate(
std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
0);

How to overload std::remove for std::list?

I once learnt that the general way to erase elements from a container is via the erase-remove-idiom. But I was surprised to find out that at least g++'s STL implementation does not overload std::remove() for std::list, since in this case a lot of object assignments could be saved by doing the reordering via pointer manipulation.
Is there a reason that the C++ standard does not mandate such an optimisation? But my main question is how I can overload std::remove() (it does not have to be portable beyond g++), so I could provide an implementation that use list::splice()/list::merge() instead. I tried a couple of signatures but get an ambiguity error at best, for example:
template <typename T>
typename std::list<T>::iterator
remove(typename std::list<T>::iterator first,
typename std::list<T>::iterator last, const T &v);
P.S.: I am sorry that I was not clear enough. Please ignore that the functions come from the std namespace and what they do specifically. I just wish to learn more about the template/typetraits/overload rules in C++.
It's not mandated because it's not just an optimization, it has different semantics from what you'd expect for a Sequence container:
std::list<int> l;
l.push_back(1);
l.push_back(2);
std::list<int>::iterator one = l.begin();
std::list<int>::iterator two = l.end(); --two;
if (something) {
l.erase(remove(l.begin(), l.end(), 1), l.end());
// one is still valid and *one == 2, two has been invalidated
} else {
l.remove(1);
// two is still valid and *two == 2, one has been invalidated
}
Regarding the actual question: ISWYM, I'm stuck for the moment how to write a pair of function templates so that one matches arbitrary iterators and the other matches list iterators, without ambiguity.
Do be aware that there isn't actually any guarantee in the standard that list<T>::iterator is a different type from some_other_container<T>::iterator. So although in practice you'd expect each container to have its own iterator, in principle the approach is flawed quite aside from the fact that you suggested putting the overload in std. You can't use iterators alone to make "structural" changes to their corresponding containers.
You can do this without ambiguity:
template <typename Container>
void erase_all(Container &, const typename Container::value_type &);
template <typename T>
void erase_all(std::list<T> &, const T &);
list::remove or list::erase alone will do what you've seen the erase/remove idiom do for vectors.
remove for values or predicates. erase for single iterators or ranges.
The advice you received is good, but not universal. It is good for std::vector, for instance, but completely innecessary for std::list since std::list::erase() and std::list::remove() already do the right thing. They do all the pointer magic you request, something std::vector::erase() cannot do because its internal storage is different. This is the reason why std::remove() is not specialised for std::list: because there is no need to use it in this case.

Is it possible to test whether two iterators point to the same object?

Say I'm making a function to copy a value:
template<class ItInput, class ItOutput>
void copy(ItInput i, ItOutput o) { *o = *i; }
and I would like to avoid the assignment if i and o point to the same object, since then the assignment is pointless.
Obviously, I can't say if (i != o) { ... }, both because i and o might be of different types and because they might point into different containers (and would thus be incomparable). Less obviously, I can't use overloaded function templates either, because the iterators might belong to different containers even though they have the same type.
My initial solution to this was:
template<class ItInput, class ItOutput>
void copy(ItInput i, ItOutput o)
{
if (&*o != static_cast<void const *>(&*i))
*o = *i;
}
but I'm not sure if this works. What if *o or *i actually returns an object instead of a reference?
Is there a way to do this generally?
I don't think that this is really necessary: if assignment is expensive, the type should define an assignment operator that performs the (relatively cheap) self assignment check to prevent doing unnecessary work. But, it's an interesting question, with many pitfalls, so I'll take a stab at answering it.
If we are to assemble a general solution that works for input and output iterators, there are several pitfalls that we must watch out for:
An input iterator is a single-pass iterator: you can only perform indirection via the iterator once per element, so, we can't perform indirection via the iterator once to get the address of the pointed-to value and a second time to perform the copy.
An input iterator may be a proxy iterator. A proxy iterator is an iterator whose operator* returns an object, not a reference. With a proxy iterator, the expression &*it is ill-formed, because *it is an rvalue (it's possible to overload the unary-&, but doing so is usually considered evil and horrible, and most types do not do this).
An output iterator can only be used for output; you cannot perform indirection via it and use the result as an rvalue. You can write to the "pointed to element" but you can't read from it.
So, if we're going to make your "optimization," we'll need to make it only for the case where both iterators are forward iterators (this includes bidirectional iterators and random access iterators: they're forward iterators too).
Because we're nice, we also need to be mindful of the fact that, despite the fact that it violates the concept requirements, many proxy iterators misrepresent their category because it is very useful to have a proxy iterator that supports random access over a sequence of proxied objects. (I'm not even sure how one could implement an efficient iterator for std::vector<bool> without doing this.)
We'll use the following Standard Library headers:
#include <iterator>
#include <type_traits>
#include <utility>
We define a metafunction, is_forward_iterator, that tests whether a type is a "real" forward iterator (i.e., is not a proxy iterator):
template <typename T>
struct is_forward_iterator :
std::integral_constant<
bool,
std::is_base_of<
std::forward_iterator_tag,
typename std::iterator_traits<T>::iterator_category
>::value &&
std::is_lvalue_reference<
decltype(*std::declval<T>())
>::value>
{ };
For brevity, we also define a metafunction, can_compare, that tests whether two types are both forward iterators:
template <typename T, typename U>
struct can_compare :
std::integral_constant<
bool,
is_forward_iterator<T>::value &&
is_forward_iterator<U>::value
>
{ };
Then, we'll write two overloads of the copy function and use SFINAE to select the right overload based on the iterator types: if both iterators are forward iterators, we'll include the check, otherwise we'll exclude the check and always perform the assignment:
template <typename InputIt, typename OutputIt>
auto copy(InputIt const in, OutputIt const out)
-> typename std::enable_if<can_compare<InputIt, OutputIt>::value>::type
{
if (static_cast<void const volatile*>(std::addressof(*in)) !=
static_cast<void const volatile*>(std::addressof(*out)))
*out = *in;
}
template <typename InputIt, typename OutputIt>
auto copy(InputIt const in, OutputIt const out)
-> typename std::enable_if<!can_compare<InputIt, OutputIt>::value>::type
{
*out = *in;
}
As easy as pie!
I think this may be a case where you may have to document some assumptions about the types you expect in the function and be content with not being completely generic.
Like operator*, operator& could be overloaded to do all sorts of things. If you're guarding against operator*, then you should consider operator& and operator!=, etc.
I would say that a good prerequisite to enforce (either through comments in the code or a concept/static_assert) is that operator* returns a reference to the object pointed to by the iterator and that it doesn't (or shouldn't) perform a copy. In that case, your code as it stands seems fine.
Your code, as is, is definitly not okay, or atleast not okay for all iterator categories.
Input iterators and output iterators are not required to be dereferenceable after the first time (they're expected to be single-pass) and input iterators are allowed to dereference to anything "convertible to T" (§24.2.3/2).
So, if you want to handle all kinds of iterators, I don't think you can enforce this "optimization", i.e. you can't generically check if two iterators point to the same object. If you're willing to forego input and output iterators, what you have should be fine. Otherwise, I'd stick with doing the copy in any case (I really don't think you have another option on this).
Write a helper template function equals that automatically returns false if the iterators are different types. Either that or do a specialization or overload of your copy function itself.
If they're the same type then you can use your trick of comparing the pointers of the objects they resolve to, no casting required:
if (&*i != &*o)
*o = *i;
If *i or *o doesn't return a reference, no problem - the copy will occur even if it didn't have to, but no harm will be done.

What's wrong with passing C++ iterator by reference?

I've written a few functions with a prototype like this:
template <typename input_iterator>
int parse_integer(input_iterator &begin, input_iterator end);
The idea is that the caller would provide a range of characters, and the function would interpret the characters as an integer value and return it, leaving begin at one past the last-used character. For example:
std::string sample_text("123 foo bar");
std::string::const_iterator p(sample_text.begin());
std::string::const_iterator end(sample_text.end());
int i = parse_integer(p, end);
This would leave i set to 123 and p "pointing" at the space before foo.
I've since been told (without explanation) that it's bad form to pass an iterator by reference. Is it bad form? If so, why?
There is nothing really wrong, but it will certainly limit the use of the template. You won't be able to just put an iterator returned by something else or generated like v.begin(), since those will be temporaries. You will always first have to make a local copy, which is some kind of boilerplate not really nice to have.
One way is to overload it:
int parse_integer(input_iterator begin, input_iterator end,
input_iterator &newbegin);
template<typename input_iterator>
int parse_integer(input_iterator begin, input_iterator end) {
return parse_integer(begin, end, begin);
}
Another option is to have an output iterator where the number will be written into:
template<typename input_iterator, typename output_iterator>
input_iterator parse_integer(input_iterator begin, input_iterator end,
output_iterator out);
You will have the return value to return the new input iterator. And you could then use a inserter iterator to put the parsed numbers into a vector or a pointer to put them directly into an integer or an array thereof if you already know the amount of numbers.
int i;
b = parse_integer(b, end, &i);
std::vector<int> numbers;
b = parse_integer(b, end, std::back_inserter(numbers));
In general:
If you pass a non-const reference, the caller doesn't know if the iterator is being modified.
You could pass a const reference, but usually iterators are small enough that it gives no advantage over passing by value.
In your case:
I don't think there's anything wrong with what you do, except that it's not too standard-esque regarding iterator usage.
When they say "don't pass by reference" maybe that's because it's more normal/idiomatic to pass iterators as value parameters, instead of passing them by const reference: which you did, for the second parameter.
In this example however you need to return two values: the parsed int value, and, the new/modified iterator value; and given that a function can't have two return codes, coding one of the return codes as a non-const reference is IMO normal.
An alternative would be to code it something like this:
//Comment: the return code is a pair of values, i.e. the parsed int and etc ...
pair<int, input_iterator> parse(input_iterator start, input_iterator end)
{
}
In my opinion, if you want to do this the argument should be a pointer to the iterator you'll be changing. I'm not a big fan of non-const reference arguments because they hide the fact that the passed parameter might change. I know there's a lot of C++ users who disagree with my opinion on this - and that's fine.
However, in this case it's so common for iterators to be treated as value arguments that I think it's a particularly bad idea to pass iterators by non-const reference and modify the passed iterator. It just goes against the idiomatic way iterators are usually used.
Since there is a great way to do what you want that doesn't have this problem, I think you should use it:
template <typename input_iterator>
int parse_integer(input_iterator* begin, input_iterator end);
Now a caller would have to do:
int i = parse_integer(&p, end);
And it'll be obvious that the iterator can be changed.
By the way, I also like litb's suggestion of returning the new iterator and putting the parsed values into a location specified by an output iterator.
In this context, I think that passing an iterator by reference is perfectly sensible, as long as it's well-documented.
It's worth noting that your approach (passing an iterator by reference to keep track of where you are when tokenizing a stream) is exactly the approach that is taken by boost::tokenizer. In particular, see the definition of the TokenizerFunction Concept. Overall, I find boost::tokenizer to be pretty well designed and well thought out.
I think the Standard Library algorithms pass iterators by value exclusively (someone will now post an obvious exception to this) - this may be the origin of the idea. Of course, nothing says that your own code has to look like the Standard Library!
Your function declaration's second parameter is missing the reference, is it?
Anyway, back to your question: No, I haven't ever read anything that says you should not pass iterators by reference. The problem with references is that they allow you to change the referenced object. In this case, if you are to change the iterator, you are potentially screwing up the entire sequence beyond that point thereby rendering further processing impossible.
Just one suggestion: type your parameters carefully.