std::begin and R-values - c++

Recently I was trying to fix a pretty difficult const-correctness compiler error. It initially manifested as a multi-paragraph template vomit error deep within Boost.Python.
But that's irrelevant: it all boiled down to the following fact: the C++11 std::begin and std::end iterator functions are not overloaded to take R-values.
The definition(s) of std::begin are:
template< class C >
auto begin( C& c ) -> decltype(c.begin());
template< class C >
auto begin( const C& c ) -> decltype(c.begin());
So since there is no R-value/Universal Reference overload, if you pass it an R-value you get a const iterator.
So why do I care? Well, if you ever have some kind of "range" container type, i.e. like a "view", "proxy" or a "slice" or some container type that presents a sub iterator range of another container, it is often very convenient to use R-value semantics and get non-const iterators from temporary slice/range objects. But with std::begin, you're out of luck because std::begin will always return a const-iterator for R-values. This is an old problem which C++03 programmers were often frustrated with back in the day before C++11 gave us R-values - i.e. the problem of temporaries always binding as const.
So, why isn't std::begin defined as:
template <class C>
auto begin(C&& c) -> decltype(c.begin());
This way, if c is constant we get a C::const_iterator and a C::iterator otherwise.
At first, I thought the reason was for safety. If you passed a temporary to std::begin, like so:
auto it = std::begin(std::string("temporary string")); // never do this
...you'd get an invalid iterator. But then I realized this problem still exists with the current implementation. The above code would simply return an invalid const-iterator, which would probably segfault when dereferenced.
So, why is std::begin not defined to take an R-value (or more accurately, a Universal Reference)? Why have two overloads (one for const and one for non-const)?

The above code would simply return an invalid const-iterator
Not quite. The iterator will be valid until the end of the full-expression that the temporary the iterator refers to was lexically created in. So something like
std::copy_n( std::begin(std::string("Hallo")), 2,
std::ostreambuf_iterator<char>(std::cout) );
is still valid code. Of course, in your example, it is invalidated at the end of the statement.
What point would there be in modifying a temporary or xvalue? That is probably one of the questions the designers of the range accessors had in mind when proposing the declarations. They didn't consider "proxy" ranges for which the iterators returned by .begin() and .end() are valid past its lifetime; Perhaps for the very reason that, in template code, they cannot be distinguished from normal ranges - and we certainly don't want to modify temporary non-proxy ranges, since that is pointless and might lead to confusion.
However, you don't need to use std::begin in the first place but could rather declare them with a using-declaration:
using std::begin;
using std::end;
and use ADL. This way you declare a namespace-scope begin and end overload for the types that Boost.Python (o.s.) uses and circumvent the restrictions of std::begin. E.g.
iterator begin(boost_slice&& s) { return s.begin(); }
iterator end (boost_slice&& s) { return s.end() ; }
// […]
begin(some_slice) // Calls the global overload, returns non-const iterator
Why have two overloads (one for const and one for non-const)?
Because we still want rvalues objects to be supported (and they cannot be taken by a function parameter of the form T&).

Related

Const correctness in generic functions using iterators

I want to write a generic functions that takes in a sequence, while guaranteeing to not alter said sequence.
template<typename ConstInputIter, typename OutputIter>
OutputIter f(ConstInputIter begin, ConstInputIter end, OutputIter out)
{
InputIter iter = begin;
do
{
*out++ = some_operation(*iter);
}while(iter!=end);
return out;
}
Yet the above example still would take any type as ConstInputIterator, not just const ones. So far, the notion towards being const in it is nominal.
How do I declare the sequence given will not be altered by this function?
Even in C++20, there is no generic way to coerce an iterator over a non-const T into an iterator over a T const. Particular iterators may have a mechanism to do that, and you can use std::cbegin/cend for ranges to get const iterators. But given only an iterator, you are at the mercy of what the user provides.
Applying a C++20 constraint (requiring iter_value_t to be const) is the wrong thing, as your function should be able to operate on a non-const range.
Since c++17, you can use std::as_const every time you deference your iterator:
// ...
*out++ = some_operation(std::as_const(*iter));
// ...
This makes sure you never access the elements of the underlying container through a non-const l-value reference.
Note: if you don't have access to c++17, it's pretty trivial to implement your own version of std::as_const. Just make sure you declare it outside of namespace std.

Is std::span a valid iterator::reference type?

I have trouble finding the semantics of the reference type trait of an iterator. Let's say I want to implement a chunk iterator, that, given a position into a range, will give me chunks of that range:
template<class T, int N>
class chunk_iterator {
public:
using reference = std::span<T,N>;
chunk_iterator(T* ptr): ptr(ptr) {}
chunk_iterator operator++() { ptr += N; return *this; }
reference operator*() const { return {ptr,N}; }
private:
T* ptr;
};
The problem that I see here is that std::span is a view-like thing, but it does not behave like a reference (say a std::array<T,N>& in this case). In particular, if I assign to a span, the assignement is shallow, it will not copy the value.
Is std::span a valid iterator::reference type? Are view and reference semantics explained in detail somewhere?
What should I do to solve my problem? Implement a span_ref with proper reference semantics? It it already implemented in some library? Is a non-native reference type even allowed?
(note: solving the problem by storing a std::array<T,N> and returning a std::array<T,N>& in operator* is doable, but ugly, and if N is not known at compile time, storing instead a std::vector<T> with dynamic memory allocation is just plain wrong)
When talking about standard-compliant iterators, it depends on several things.
For conforming Iterators, it almost doesn't matter what the reference type is because the standard does not require any usage semantics for the reference type. But that also means nobody except you knows how to use your iterator.
For conforming Input Iterators, the reference type must meet the semantics specified. Notice that for LegacyInputIterator, the expression *it must be a reference that is usable as a reference with all the normal semantics, otherwise code that uses your iterator will not behave as expected. This means reading from a reference is akin to reading from a built-in reference. In particular, the following should do "normal" things:
auto value = *itr; // this should read a value
In this situation, a view type like span wouldn't work because span is more like a pointer than a reference: in the above snippet value would be a span, not whatever the span refers to.
For conforming Output Iterators, the reference type has no requirements. In fact, standard LegacyOutputIterators like std::back_insert_iterator have void as a reference type.
For conforming Forward Iterators and above, the standard actually requires the reference be a built-in reference. This is to support uses like below:
auto& ref = *itr;
auto ptr = &ref; // this must create a pointer pointing to the original object
auto ref2 = *ptr; // this must create a second, equivalent reference
auto other = std::move( ref ); // this must do a "move", which may be the same as a copy
ref = other; // this must assign "other"'s value back into the referred-to object
If the above didn't work correctly, many of the standard algorithms wouldn't be possible to write generically.
Speaking to span specifically, it acts more like a pointer than a reference logically. It can be re-assigned to point to something else. Taking its address creates a pointer to the span, not a pointer to the container being spanned over. Calling std::move on a span copies the span, and doesn't move the contents of the spanned range. A built-in reference T& will only refer to one thing ever once it's been created.
Creating a non-conforming reference that actually works with standard algorithms would involve a family of types overloading operator*, operator->, and operator&, operator=, and std::move, and modeling pointers, lvalue references, and rvalue references.
The meaning of an iterator's reference type cannot be understood without comprehending its relationship to the iterator's value_type. An iterator is a construct that represents a position within a sequence of value_types. A reference is a mediator within this paradigm; it is a thing that acts like a value_type (const) &. Until you figure out what your value_type is going to be, you can't decide what your reference will need to look like.
What "acts like" means depends on what kind of iterator we're talking about.
For C++11, the InputIterator category requires that reference be a type which is implicitly convertible to a value_type. For the OutputIterator category, reference is required to be a type which is assignable from a value_type.
For all of the more restricted iterator categories (ForwardIterator and above), reference is required to be exactly one of value_type & (if you can write to the sequence) or value_type const & (if you can only read from the sequence).
Iterators where reference is not a value_type (const) & are often called proxy iterators, as the reference type typically acts as a "proxy" for the actual data stored in the sequence (assuming the iterator isn't just inventing values to begin with). Proxy iterators are often used for cases where the iterator doesn't iterate over a range of actual value_types, but simply pretends to. This could be the bitwise iterators of vector<bool> or an iterator that iterates over the sequence of integers on some half-open range [0, N).
But proxy iterator references have to act like language references to one degree or another. InputIterator references have to be implicitly convertible to the value_type. span<T, N> is not implicitly convertible to array<T, N> or any other container type that would be appropriate for a value_type. OutputIterator references have to be assignable from value_type. And while span<T, N> may be assignable from an array<T, N>, the assignment operation doesn't have the same meaning. To assign to an OutputIterator's reference ought to change the values stored within the sequence. And this doesn't.
In any case, you first need to invent a value_type that does what you need it to do. Then you need to build a proper reference type that acts like a reference. Lastly... well, you can't make your iterator a ForwardIterator or higher, because C++11 doesn't support proxy iterators of the most useful iterator categories. C++20's new formulation of iterators allows proxy iterators for anything that isn't a contiguous_iterator.

What is the purpose of C++20 std::common_reference?

C++20 introduces std::common_reference. What is its purpose? Can someone give an example of using it?
common_reference came out of my efforts to come up with a conceptualization of STL's iterators that accommodates proxy iterators.
In the STL, iterators have two associated types of particular interest: reference and value_type. The former is the return type of the iterator's operator*, and the value_type is the (non-const, non-reference) type of the elements of the sequence.
Generic algorithms often have a need to do things like this:
value_type tmp = *it;
... so we know that there must be some relationship between these two types. For non-proxy iterators the relationship is simple: reference is always value_type, optionally const and reference qualified. Early attempts at defining the InputIterator concept required that the expression *it was convertible to const value_type &, and for most interesting iterators that is sufficient.
I wanted iterators in C++20 to be more powerful than this. For example, consider the needs of a zip_iterator that iterates two sequences in lock-step. When you dereference a zip_iterator, you get a temporary pair of the two iterators' reference types. So, zip'ing a vector<int> and a vector<double> would have these associated types:
zip iterator's reference : pair<int &, double &>
zip iterator's value_type: pair<int, double>
As you can see, these two types are not related to each other simply by adding top-level cv- and ref qualification. And yet letting the two types be arbitrarily different feels wrong. Clearly there is some relationship here. But what is the relationship, and what can generic algorithms that operate on iterators safely assume about the two types?
The answer in C++20 is that for any valid iterator type, proxy or not, the types reference && and value_type & share a common reference. In other words, for some iterator it there is some type CR which makes the following well-formed:
void foo(CR) // CR is the common reference for iterator I
{}
void algo( I it, iter_value_t<I> val )
{
foo(val); // OK, lvalue to value_type convertible to CR
foo(*it); // OK, reference convertible to CR
}
CR is the common reference. All algorithms can rely on the fact that this type exists, and can use std::common_reference to compute it.
So, that is the role that common_reference plays in the STL in C++20. Generally, unless you are writing generic algorithms or proxy iterators, you can safely ignore it. It's there under the covers ensuring that your iterators are meeting their contractual obligations.
EDIT: The OP also asked for an example. This is a little contrived, but imagine it's C++20 and you are given a random-access range r of type R about which you know nothing, and you want to sort the range.
Further imagine that for some reason, you want to use a monomorphic comparison function, like std::less<T>. (Maybe you've type-erased the range, and you need to also type-erase the comparison function and pass it through a virtual? Again, a stretch.) What should T be in std::less<T>? For that you would use common_reference, or the helper iter_common_reference_t which is implemented in terms of it.
using CR = std::iter_common_reference_t<std::ranges::iterator_t<R>>;
std::ranges::sort(r, std::less<CR>{});
That is guaranteed to work, even if range r has proxy iterators.

Why does std::cbegin() not call .cbegin() on the container?

The following code fails the static assertion:
#include <gsl/span>
#include <iterator>
#include <type_traits>
int main()
{
int theArr[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
gsl::span<int> theSpan{ theArr, std::size(theArr) };
using std::cbegin;
auto it1 = cbegin(theSpan);
auto it2 = theSpan.cbegin();
static_assert(std::is_same_v<decltype(it1), decltype(it2)>);
}
This fails because std::cbegin() calls the .begin() method on a const ref of the container. For standard-defined containers, this returns a const_iterator, which is the same type that .cbegin() returns. However, gsl::span is a bit unique because it models a sort of "borrow type". A const gsl::span behaves like a const pointer; the span itself is const, but what it points-to is not const. Hence, the .begin() method on a const gsl::span still returns a non-const iterator, whereas explicitly calling .cbegin() returns a const iterator.
I'm curious as to why std::cbegin() was not defined as invoking .cbegin() on the container (which all standard containers seem to implement) to account for cases such as this.
This is somewhat related to: Why does std::cbegin return the same type as std::begin
this fails because std::cbegin() calls the .begin()
To be more precise, std::cbegin calls std::begin, which in the generic overload calls c.begin.
For what it's worth, it should be possible to fix gsl::span to return const iterator upon std::cbegin if the designers of gsl specify that there is a specialisation for the generic overload of std::cbegin for gsl::span that uses c.cbegin instead of std::begin, if that is the desired behaviour. I don't know their reasoning for not specifying such specialisation.
As for reasoning for why std::cbegin uses std::begin, I do not know for fact either, but it does have the advantage of being able to support containers that have a c.begin member, but not a c.cbegin member, which can be seen as a less strict requirement, as it can be satisfied by custom containers written prior to C++11, when there was no convention of providing a c.cbegin member function.
First, note that, per [tab:container.req]:
Expression: a.cbegin()
Return type: const_­iterator
Operational semantics: const_­cast<​X const&​>(a)​.begin();
Complexity: constant
Therefore, gsl::span is not a container at all. cbegin and cend are designed to work with containers. There are some exceptions (arrays, initializer_list) that require special care, but apparently the standard library cannot mention something like gsl::span.
Second, it is LWG 2128 that introduced global cbegin and cend. Let's see what the relevant part says:
Implement std::cbegin/cend() by calling std::begin/end(). This has
numerous advantages:
It automatically works with arrays, which is the whole point of these non-member functions.
It works with C++98/03-era user containers, written before cbegin/cend() members were invented.
It works with initializer_list, which is extremely minimal and lacks cbegin/cend() members.
[container.requirements.general]
guarantees that this is equivalent to calling cbegin/cend() members.
Essentially, calling std::begin/end() save the work of providing special care for arrays and initializer_list.

Is it possible to test whether two iterators point to the same object?

Say I'm making a function to copy a value:
template<class ItInput, class ItOutput>
void copy(ItInput i, ItOutput o) { *o = *i; }
and I would like to avoid the assignment if i and o point to the same object, since then the assignment is pointless.
Obviously, I can't say if (i != o) { ... }, both because i and o might be of different types and because they might point into different containers (and would thus be incomparable). Less obviously, I can't use overloaded function templates either, because the iterators might belong to different containers even though they have the same type.
My initial solution to this was:
template<class ItInput, class ItOutput>
void copy(ItInput i, ItOutput o)
{
if (&*o != static_cast<void const *>(&*i))
*o = *i;
}
but I'm not sure if this works. What if *o or *i actually returns an object instead of a reference?
Is there a way to do this generally?
I don't think that this is really necessary: if assignment is expensive, the type should define an assignment operator that performs the (relatively cheap) self assignment check to prevent doing unnecessary work. But, it's an interesting question, with many pitfalls, so I'll take a stab at answering it.
If we are to assemble a general solution that works for input and output iterators, there are several pitfalls that we must watch out for:
An input iterator is a single-pass iterator: you can only perform indirection via the iterator once per element, so, we can't perform indirection via the iterator once to get the address of the pointed-to value and a second time to perform the copy.
An input iterator may be a proxy iterator. A proxy iterator is an iterator whose operator* returns an object, not a reference. With a proxy iterator, the expression &*it is ill-formed, because *it is an rvalue (it's possible to overload the unary-&, but doing so is usually considered evil and horrible, and most types do not do this).
An output iterator can only be used for output; you cannot perform indirection via it and use the result as an rvalue. You can write to the "pointed to element" but you can't read from it.
So, if we're going to make your "optimization," we'll need to make it only for the case where both iterators are forward iterators (this includes bidirectional iterators and random access iterators: they're forward iterators too).
Because we're nice, we also need to be mindful of the fact that, despite the fact that it violates the concept requirements, many proxy iterators misrepresent their category because it is very useful to have a proxy iterator that supports random access over a sequence of proxied objects. (I'm not even sure how one could implement an efficient iterator for std::vector<bool> without doing this.)
We'll use the following Standard Library headers:
#include <iterator>
#include <type_traits>
#include <utility>
We define a metafunction, is_forward_iterator, that tests whether a type is a "real" forward iterator (i.e., is not a proxy iterator):
template <typename T>
struct is_forward_iterator :
std::integral_constant<
bool,
std::is_base_of<
std::forward_iterator_tag,
typename std::iterator_traits<T>::iterator_category
>::value &&
std::is_lvalue_reference<
decltype(*std::declval<T>())
>::value>
{ };
For brevity, we also define a metafunction, can_compare, that tests whether two types are both forward iterators:
template <typename T, typename U>
struct can_compare :
std::integral_constant<
bool,
is_forward_iterator<T>::value &&
is_forward_iterator<U>::value
>
{ };
Then, we'll write two overloads of the copy function and use SFINAE to select the right overload based on the iterator types: if both iterators are forward iterators, we'll include the check, otherwise we'll exclude the check and always perform the assignment:
template <typename InputIt, typename OutputIt>
auto copy(InputIt const in, OutputIt const out)
-> typename std::enable_if<can_compare<InputIt, OutputIt>::value>::type
{
if (static_cast<void const volatile*>(std::addressof(*in)) !=
static_cast<void const volatile*>(std::addressof(*out)))
*out = *in;
}
template <typename InputIt, typename OutputIt>
auto copy(InputIt const in, OutputIt const out)
-> typename std::enable_if<!can_compare<InputIt, OutputIt>::value>::type
{
*out = *in;
}
As easy as pie!
I think this may be a case where you may have to document some assumptions about the types you expect in the function and be content with not being completely generic.
Like operator*, operator& could be overloaded to do all sorts of things. If you're guarding against operator*, then you should consider operator& and operator!=, etc.
I would say that a good prerequisite to enforce (either through comments in the code or a concept/static_assert) is that operator* returns a reference to the object pointed to by the iterator and that it doesn't (or shouldn't) perform a copy. In that case, your code as it stands seems fine.
Your code, as is, is definitly not okay, or atleast not okay for all iterator categories.
Input iterators and output iterators are not required to be dereferenceable after the first time (they're expected to be single-pass) and input iterators are allowed to dereference to anything "convertible to T" (§24.2.3/2).
So, if you want to handle all kinds of iterators, I don't think you can enforce this "optimization", i.e. you can't generically check if two iterators point to the same object. If you're willing to forego input and output iterators, what you have should be fine. Otherwise, I'd stick with doing the copy in any case (I really don't think you have another option on this).
Write a helper template function equals that automatically returns false if the iterators are different types. Either that or do a specialization or overload of your copy function itself.
If they're the same type then you can use your trick of comparing the pointers of the objects they resolve to, no casting required:
if (&*i != &*o)
*o = *i;
If *i or *o doesn't return a reference, no problem - the copy will occur even if it didn't have to, but no harm will be done.