Contiguous iterator detection

Contiguous iterator detection - c++

C++17 introduced the concept of ContiguousIterator http://en.cppreference.com/w/cpp/iterator.
However it doesn't seem that there are plans to have a contiguous_iterator_tag (in the same way we now have random_access_iterator_tag) reported by std::iterator_traits<It>::iterator_category.
Why is contiguous_iterator_tag missing?
Is there a conventional protocol to determine if an iterator is Contiguous?
Or a compile time test?
In the past I mentioned that for containers if there is a .data() member that converts to a pointer to ::value type and there is .size() member convertible to pointer differences, then one should assume that the container is contiguous, but I can't pull an analogous feature of iterators.
One solution could be to have also a data function for contiguous iterators.
Of course the Contiguous concept works if &(it[n]) == (&(*it)) + n, for all n, but this can't be checked at compile time.
EDIT: I found this video which puts this in the more broader context of C++ concepts. CppCon 2016: "Building and Extending the Iterator Hierarchy in a Modern, Multicore World" by Patrick Niedzielski. The solution uses concepts (Lite) but at the end the idea is that contiguous iterators should implement a pointer_from function (same as my data(...) function).
The conclusion is that concepts will help formalizing the theory, but they are not magic, in the sense that someone, somewhere will define new especially named functions over iterators that are contiguous.
The talk generalizes to segmented iterators (with corresponding functions segment and local), unfortunatelly it doesn't say anything about strided pointers.
EDIT 2020:
The standard now has
struct contiguous_iterator_tag: public random_access_iterator_tag { };
https://en.cppreference.com/w/cpp/iterator/iterator_tags

Original answer
The rationale is given in N4284, which is the adopted version of the contiguous iterators proposal:
This paper introduces the term "contiguous iterator" as a refinement of random-access iterator, without introducing a corresponding contiguous_iterator_tag, which was found to break code during the Issaquah discussions of Nevin Liber's paper N3884 "Contiguous Iterators: A Refinement of Random Access Iterators".
Some code was broken because it assumed that std::random_access_iterator couldn't be refined, and had explicit checks against it. Basically it broke bad code that didn't rely on polymorphism to check for the categories of iterators, but it broke code nonetheless, so std::contiguous_iterator_tag was removed from the proposal.
Also, there was an additional problem with std::reverse_iterator-like classes: a reversed contiguous iterator can't be a contiguous iterator, but a regular random-access iterator. This problem could have been solved for std::reverse_iterator, but more user-defined iterator wrappers that augment an iterator while copying its iterator category would have either lied or stopped working correctly (for example Boost iterator adaptors).
C++20 update
Since my original answer above, std::contiguous_iterator_tag was brought back in the Ranges TS, then adopted in C++20. In order to avoid the issues mentioned above, the behaviour of std::iterator_traits<T>::iterator_category was not changed. Instead, user specializations of std::iterator_traits can now define an additional iterator_concept member type alias which is allowed to alias std::contiguous_iterator_tag or the previous iterator tags. The standard components has been updated accordingly in order to mark pointers and appropriate iterators a contiguous iterators.
The standard defines an exposition-only ITER_CONCEPT(Iter) which, given an iterator type Iter, will alias std::iterator_traits<Iter>::iterator_concept if it exists and std::iterator_traits<Iter>::iterator_category otherwise. There is no equivalent standard user-facing type trait, but ITER_CONCEPT is used by the new iterator concepts. It is a strong hint that you should use these iterator concepts instead of old-style tag dispatch to implement new functions whose behaviour depends on the iterator category. That said concepts are usable as boolean traits, so you can simply check that an iterator is a contiguous iterator as follows:
static_assert(std::contiguous_iterator<Iter>);
std::contiguous_iterator is thus the C++20 concept that you should use to detect that a given iterator is a random-access iterator (it also has a ranges counterpart: std::contiguous_range). It is worth noting that std::contiguous_iterator has a few additional constraints besides requiring that ITER_CONCEPT matches std::contiguous_iterator_tag: most notably it requires std::to_address(it) to be a valid expression returning a raw pointer type. std::to_address is a small utility function meant to avoid a few pitfalls that can occur when trying to retrieve the address where a contiguous iterator points - you can read more about the issues it solves in Helpful pointers for ContiguousIterator.
About implementing ITER_CONCEPT: it is worth nothing that implementing ITER_CONCEPT as described in the standard is only possible if you are the owner of std::iterator_traits because it requires detecting whether the primary instantiation of std::iterator_traits is used.

Related

Do the Range TS and C++20 concepts for iterators require the ability to use `operator->`?

I've searched through various Range TS proposals, including P0896, the one incorporating the ranges into C++20. It seems from my reading that the only requirement the Iterator concept makes in terms of dereferenceability is that *t be valid syntax that yields an object of some type.
Since InputIterator is defined in terms of being an Iterator and being Readable, neither of which requires operator-> support, it appears that the Range TS and C++20 do not require that iterators provide -> support.
Is this the case?

Yes, we've dropped the operator-> requirement from InputIterator, and consequently the iterator concepts that refine it. (The requirement remains part of the "old" input iterator requirements, which are unchanged.) There are a number of reasons:
There's no way to implement -> for many iterator types such that the semantics of i->m are equivalent to (*i).m as the "old" requirements expect. move_iterator is a good example: (*i).m is an rvalue, whereas i->m is an lvalue. (Yes, it's yet another Standard iterator that doesn't satisfy the iterator requirements.)
There's no way to usefully constrain -> with concepts. Sure, we could require that there is an operator->, but we couldn't constrain it to have reasonable syntax.
Most importantly, -> is useless to the standard algorithms: they have no idea if the elements denoted by an iterator have members, let alone how to name such members.
This doesn't mean that standard iterators won't provide operator-> (Although see LWG 2790), only that iterators aren't required to implement such an operator to be usable with the standard library.

What could be a "least bad implementation" for an iterator over a proxied container?

Context
I was trying to implement a nD array like container. Something that would wrap an underlying sequence container and allow to process it as a container of containers (of...): arr[i][j][k] should be a (eventually const) reference for _arr[(((i * dim2) + j) * dim3) + k].
Ok until there, arr[i] has just to be a wrapper class over the subarray...
And when I tried to implement interators, I suddenly realized that dragons were everywhere around:
my container is not a standard compliant container because operator [] returns a proxy or wrapper instead of a true reference (When Is a Container Not a Container?)
this causes the iterator to be either a stashing iterator (which is known to be bad (Reference invalidation after applying reverse_iterator on a custom made iterator and its accepted answer)
... or a proxy iterator which is not necessarily better (To Be or Not to Be (an Iterator))
The real problem is that as soon as you have a proxied container, no iterator can respect the following requirement for a forward iterator:
Forward iterators [forward.iterators]
...6 If a and b are both dereferenceable, then a == b if and only if *a and *b are bound to the same object.
Examples come from the standard library itself:
vector<bool> is known not to respect all the requirements for containers because it returns proxies instead of references:
Class vector [vector.bool]...3 There is no requirement that the data be stored as a contiguous allocation of bool values. A space-optimized
representation of bits is recommended instead.
4 reference is a class that simulates the behavior of references of a single bit in vector.
filesystem path iterator is known to be a stashing iterator:
path iterators [fs.path.itr]...
2 A path::iterator is a constant iterator satisfying all the requirements of a bidirectional iterator (27.2.6)
except that, for dereferenceable iterators a and b of type path::iterator with a == b, there is no requirement
that*a and *b are bound to the same object.
and from cppreference:
Notes: std::reverse_iterator does not work with iterators that return a reference to a member object (so-called "stashing iterators"). An example of stashing iterator is std::filesystem::path::iterator.
Question
I have currently found plenty of references about why proxied containers are not true containers and why it would be nice if proxied containers and iterators were allowed by the standard. But I have still not understood what was the best that could be done and what were the real limitations.
So my question is why proxy iterators are really better that stashing ones, and what algorithms are allowed for either of them. If possible, I would really love to find a reference implementation for such an iterator
For references, a current implementation of my code has been submitted on Code Review. It contains a stashing iterator (that broke immediately when I try to use std::reverse_iterator)

OK, we have two similar but distinct concepts. So lets lay them out.
But first, I need to make a distinction between the named requirements of C++-pre-20, and the actual in-language concepts created for the Ranges TS and included in C++20. They're both called "concepts", but they're defined differently. As such, when I talk about concept-with-a-lowercase-c, I mean the pre-C++20 requirements. When I talk about Concept-with-a-captial-C, I mean the C++20 stuff.
Proxy Iterators
Proxy iterators are iterators where their reference is not a value_type&, but is instead some other type that behaves like a reference to value_type. In this case, *it returns a prvalue to this reference.
The InputIterator concept imposes no requirement on reference, other than that it is convertible to value_type. However, the ForwardIterator concept makes the explicit statement that "reference is a reference to T".
Therefore, a proxy iterator cannot fit the ForwardIterator concept. But it can still be an InputIterator. So you can safely pass a proxy iterator to any function that only requires InputIterators.
So, the problem with vector<bool>s iterators is not that they're proxy iterators. It's that they promise they fulfill the RandomAccessIterator concept (though the use of the appropriate tag), when they're really only InputIterators and OutputIterators.
The Ranges proposal (mostly) adopted into C++20 makes changes to the iterator Concepts which allow all iterators to be proxy iterators. So under Ranges, vector<bool>::iterator really fulfills the RandomAccessIterator Concept. Therefore, if you have code written against the Ranges concepts, then you can use proxy iterators of all kinds.
This is very useful for dealing with things like counting ranges. You can have reference and value_type be the same type, so you're just dealing with integers either way.
And of course, if you have control over the code consuming the iterator, you can make it do whatever you want, so long as you don't violate the concept your iterator is written against.
Stashing Iterators
Stashing iterators are iterators where reference is (directly or indirectly) a reference to an object stored in the iterator. Therefore, if you make a copy of an iterator, the copy will return a reference to a different object than the original, even though they refer to the same element. And when you increment the iterator, previous references are no longer valid.
Stashing iterators are usually implemented because computing the value you want to return is expensive. Maybe it would involve a memory allocation (such as path::iterator) or maybe it would involve a possibly-complex operation that should only be done once (such as regex_iterator). So you only want to do it when necessary.
One of the foundations of ForwardIterator as a concept (or Concept) is that a range of these iterators represents a range over values which exist independently of their iterators. This permits multipass operation, but it also makes doing other things useful. You can store references to items in the range, and then iterate elsewhere.
If you need an iterator to be a ForwardIterator or higher, you should never make it a stashing iterator. Of course, the C++ standard library is not always consistent with itself. But it usually calls out its inconsistencies.
path::iterator is a stashing iterator. The standard says that it is a BidirectionalIterator; however, it also gives this type an exception to the reference/pointer preservation rule. This means that you cannot pass path::iterator to any code that might rely on that preservation rule.
Now, this doesn't mean you can't pass it to anything. Any algorithm which requires only InputIterator will be able to take such an iterator, since such code cannot rely on that rule. And of course, any code which you write or which specifically states in its documentation that it doesn't rely on that rule can be used. But there's no guarantee that you can use reverse_iterator on it, even though it says that it is a BidirectionalIterator.
regex_iterators are even worse in this regard. They are said to be a ForwardIterators based on their tag, but the standard never says that they actually are ForwardIterators (unlike path::iterator). And the specification of them as having reference be an actual reference to a member object makes it impossible for them to be true ForwardIterators.
Note that I made no distinction between the pre-C++20 concept and the Ranges Concept. That's because the std::forward_iterator Concept still forbids stashing iterators. This is by design.
Usage
Now obviously, you can do whatever you want in your code. But code you don't control will be under the domain of its owners. They will be writing against the old concepts, the new Concepts, or some other c/Concept or requirement that they specify. So your iterators need to be able to be compatible with their needs.
The algorithms that the Ranges introduces uses the new Concepts, so you can always rely on them to work with proxy iterators. However, as I understand it, the Range Concepts are not back-ported into older algorithms.
Personally, I would suggest avoiding stashing iterator implementations entirely. By providing complete support for proxy iterators, most stashing iterators can be rewritten to return values rather than references to objects.
For example, if there were a path_view type, path::iterator could have returned that instead of a full-fledged path. That way, if you want to do the expensive copy operation, you can. Similarly, the regex_iterators could have returned copies of the match object. The new Concepts make it possible to work that way by supporting proxy iterators.
Now, stashing iterators handle caching in a useful way; iterators can cache their results so that repeated *it usage only does the expensive operation once. But remember the problem with stashing iterators: returning a reference to their contents. You don't need to do that just to get caching. You can cache the results in an optional<T> (which you invalidate when the iterator is in/decremented). So you can still return a value. It may involve an additional copy, but reference shouldn't be a complex type.
Of course, all of this means that auto &val = *it; isn't legal code anymore. However, auto &&val = *it; will always work. This is actually a big part of the Range TS version of iterators.

C++ std::vector<>::iterator is not a pointer, why?

Just a little introduction, with simple words.
In C++, iterators are "things" on which you can write at least the dereference operator *it, the increment operator ++it, and for more advanced bidirectional iterators, the decrement --it, and last but not least, for random access iterators we need operator index it[] and possibly addition and subtraction.
Such "things" in C++ are objects of types with the according operator overloads, or plain and simple pointers.
std::vector<> is a container class that wraps a continuous array, so pointer as iterator makes sense. On the nets, and in some literature you can find vector.begin() used as a pointer.
The rationale for using a pointer is less overhead, higher performance, especially if an optimizing compiler detects iteration and does its thing (vector instructions and stuff). Using iterators might be harder for the compiler to optimize.
Knowing this, my question is why modern STL implementations, let's say MSVC++ 2013 or libstdc++ in Mingw 4.7, use a special class for vector iterators?

You're completely correct that vector::iterator could be implemented by a simple pointer (see here) -- in fact the concept of an iterator is based on that of a pointer to an array element. For other containers, such as map, list, or deque, however, a pointer won't work at all. So why is this not done? Here are three reasons why a class implementation is preferrable over a raw pointer.
Implementing an iterator as separate type allows additional functionality (beyond what is required by the standard), for example (added in edit following Quentins comment) the possibility to add assertions when dereferencing an iterator, for example, in debug mode.
overload resolution If the iterator were a pointer T*, it could be passed as valid argument to a function taking T*, while this would not be possible with an iterator type. Thus making std::vector<>::iterator a pointer in fact changes the behaviour of existing code. Consider, for example,
template<typename It>
void foo(It begin, It end);
void foo(const double*a, const double*b, size_t n=0);
std::vector<double> vec;
foo(vec.begin(), vec.end()); // which foo is called?
argument-dependent lookup (ADL; pointed out by juanchopanza) If you make an unqualified call, ADL ensures that functions in namespace std will be searched only if the arguments are types defined in namespace std. So,
std::vector<double> vec;
sort(vec.begin(), vec.end()); // calls std::sort
sort(vec.data(), vec.data()+vec.size()); // fails to compile
std::sort is not found if vector<>::iterator were a mere pointer.

The implementation of the iterator is implementation defined, so long as fulfills the requirements of the standard. It could be a pointer for vector, that would work. There are several reasons for not using a pointer;
consistency with other containers.
debug and error checking support
overload resolution, class based iterators allow for overloads to work differentiating them from plain pointers
If all the iterators were pointers, then ++it on a map would not increment it to the next element since the memory is not required to be not-contiguous. Past the contiguous memory of std:::vector most standard containers require "smarter" pointers - hence iterators.
The physical requirement's of the iterator dove-tail very well with the logical requirement that movement between elements it a well defined "idiom" of iterating over them, not just moving to the next memory location.
This was one of the original design requirements and goals of the STL; the orthogonal relationship between the containers, the algorithms and connecting the two through the iterators.
Now that they are classes, you can add a whole host of error checking and sanity checks to debug code (and then remove it for more optimised release code).
Given the positive aspects class based iterators bring, why should or should you not just use pointers for std::vector iterators - consistency. Early implementations of std::vector did indeed use plain pointers, you can use them for vector. Once you have to use classes for the other iterators, given the positives they bring, applying that to vector becomes a good idea.

The rationale for using a pointer is less overhead, higher
performance, especially if an optimizing compiler detects iteration
and does its thing (vector instructions and stuff). Using iterators
might be harder for the compiler to optimize.
It might be, but it isn't. If your implementation is not utter shite, a struct wrapping a pointer will achieve the same speed.
With that in mind, it's simple to see that simple benefits like better diagnostic messages (naming the iterator instead of T*), better overload resolution, ADL, and debug checking make the struct a clear winner over the pointer. The raw pointer has no advantages.

The rationale for using a pointer is less overhead, higher
performance, especially if an optimizing compiler detects iteration
and does its thing (vector instructions and stuff). Using iterators
might be harder for the compiler to optimize.
This is the misunderstanding at the heart of the question. A well formed class implementation will have no overhead, and identical performance all because the compiler can optimize away the abstraction and treat the iterator class as just a pointer in the case of std::vector.
That said,
MSVC++ 2013 or libstdc++ in Mingw 4.7, use a special class for vector
iterators
because they view that adding a layer of abstraction class iterator to define the concept of iteration over a std::vector is more beneficial than using an ordinary pointer for this purpose.
Abstractions have a different set of costs vs benefits, typically added design complexity (not necessarily related to performance or overhead) in exchange for flexibility, future proofing, hiding implementation details. The above compilers decided this added complexity is an appropriate cost to pay for the benefits of having an abstraction.

Because STL was designed with the idea that you can write something that iterates over an iterator, no matter whether that iterator's just equivalent to a pointer to an element of memory-contiguous arrays (like std::array or std::vector) or something like a linked list, a set of keys, something that gets generated on the fly on access etc.
Also, don't be fooled: In the vector case, dereferencing might (without debug options) just break down to a inlinable pointer dereference, so there wouldn't even be overhead after compilation!

I think the reason is plain and simple: originally std::vector was not required to be implemented over contiguous blocks of memory.
So the interface could not just present a pointer.
source: https://stackoverflow.com/a/849190/225186
This was fixed later and std::vector was required to be in contiguous memory, but it was probably too late to make std::vector<T>::iterator a pointer.
Maybe some code already depended on iterator to be a class/struct.
Interestingly, I found implementations of std::vector<T>::iterator where this is valid and generated a "null" iterators (just like a null pointer) it = {};.
std::vector<double>::iterator it = {};
assert( &*it == nullptr );
Also, std::array<T>::iterator and std::initializer_list<T>::iterator are pointers T* in the implementations I saw.
A plain pointer as std::vector<T>::iterator would be perfectly fine in my opinion, in theory.
In practice, being a built-in has observable effects for metaprogramming, (e.g. std::vector<T>::iterator::difference_type wouldn't be valid, yes, one should have used iterator_traits).
Not-being a raw pointer has the (very) marginal advantage of disallowing nullability (it == nullptr) or default conductibility if you are into that. (an argument that doesn't matter for a generic programming point of view.)
At the same time the dedicated class iterators had a steep cost in other metaprogramming aspects, because if ::iterator were a pointer one wouldn't need to have ad hoc methods to detect contiguous memory (see contiguous_iterator_tag in https://en.cppreference.com/w/cpp/iterator/iterator_tags) and generic code over vectors could be directly forwarded to legacy C-functions.
For this reason alone I would argue that iterator-not-being-a-pointer was a costly mistake. It just made it hard to interact with C-code (as you need another layer of functions and type detection to safely forward stuff to C).
Having said this, I think we could still make things better by allowing automatic conversions from iterators to pointers and perhaps explicit (?) conversions from pointer to vector::iterators.

I got around this pesky obstacle by dereferencing and immediately referencing the iterator again. It looks ridiculous, but it satisfies MSVC...
class Thing {
. . .
};
void handleThing(Thing* thing) {
// do stuff
}
vector<Thing> vec;
// put some elements into vec now
for (auto it = vec.begin(); it != vec.end(); ++it)
// handleThing(it); // this doesn't work, would have been elegant ..
handleThing(&*it); // this DOES work

Filtering specific iterators for template functions

I am developing a set of functions that takes advantage of containers that have packed and sequential memory storage (for memory copies). They have function signatures in the style of most STD functions, input/output iterators point to elements and denote ranges. For instance, a function could look like this:
template< typename InputIterator, typename OutputIterator >
OutputIterator fooBar( InputIterator& first, InputIterator& last,
OutputIterator& result );
I wish to verify that the iterators passed are legal, that is packed and sequential. For the STD containers, this is limited to std::vector and std::array. Unfortunately, I can't rely on the iterator 'category' trait, because the random access trait does not imply seqential storage. An example of this is microsofts concurrent_vector class, documented here parallel containers
In addition, I can't accept all iterators from the vector and array classes either, for instance i need to reject reverse iterators, and std::vector<bool> iterators are unsuitable because of the proxy class that it uses.
I've attempted to create my own traits class to distinguish and filter the iterators with the constraints that i describe above, but i'm running into template syntax problems. I am looking for feedback from others on how they would approach this problem.
Thanks

I don't think you can do this. Iterators are an abstraction whose whole purpose is to make the iteration process independent of the underlying architecture. There is no information in the standard iterators that denote the underlying memory structure or even anything remotely similar.
On your std-algorithm-like functions, it's generally advised to pass iterators by value, since they should be cheap / small objects. It should be especially noted that your function would never be able to be called as fooBar(c.begin(), c.end(), some_out_it);, since it takes the the input iterators by reference-to-non-const.
As a last point, you can filter out reverse iterators by testing whether the iterator type is a specialization of std::reverse_iterator<Iter>, since atleast the Container::(const_)reverse_iterator type of standard containers are required to be one.

Is it a good idea to create an STL iterator which is noncopyable?

Most of the time, STL iterators are CopyConstructable, because several STL algorithms require this to improve performance, such as std::sort.
However, I've been working on a pet project to wrap the FindXFile API (previously asked about), but the problem is it's impossible to implement a copyable iterator around this API. A find handle cannot be duplicated by any means -- DuplicateHandle specifically forbids passing these types of handles to it. And if you just maintain a reference count to the find handle, then a single increment by any copy results in an increment of all copies -- clearly that is not what a copy constructed iterator is supposed to do.
Since I can't satisfy the traditional copy constructible requirement for iterators here, is it even worth trying to create an "STL style" iterator? On one hand, creating some other enumeration method is going to not fall into normal STL conventions, but on the other, following STL conventions are going to confuse users of this iterator if they try to CopyConstruct it later.
Which is the lesser of two evils?

An input iterator which is not a forward iterator is copyable, but you can only "use" one of the copies: incrementing any of them invalidates the others (dereferencing one of them does not invalidate the others). This allows it to be passed to algorithms, but the algorithm must complete with a single pass. You can tell which algorithms are OK by checking their requirements - for example copy requires only an InputIterator, whereas adjacent_find requires a ForwardIterator (first one I found).
It sounds to me as though this describes your situation. Just copy the handle (or something which refcounts the handle), without duplicating it.
The user has to understand that it's only an InputIterator, but in practice this isn't a big deal. istream_iterator is the same, and for the same reason.
With the benefit of C++11 hindsight, it would almost have made sense to require InputIterators to be movable but not to require them to be copyable, since duplicates have limited use anyway. But that's "limited use", not "no use", and anyway it's too late now to remove functionality from InputIterator, considering how much code relies on the existing definition.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js