How does iterator category in C++ work? - c++

I tried to understand iterator implementation, and while playing around with the source, I saw this statement:
typedef output_iterator_tag iterator_category;
I don't understand how this typedef work within the class? What's the side effect does it provide? Can anyone walk me through this?

You need to read up on generic programming because you're not likely to get this answer.
"Output Iterator" is a concept that certain iterators match. Each iterator that is a realization of this concept has certain functionality associated with it. It's sort of like inheritance, but it isn't.
C++ doesn't have any such anything that represents concepts (was a proposed addition to C++0x but failed to make it). That being the case, we need various template constructs to allow us to associate a "tag" with an iterator type. By associating the output_iterator_tag type with an iterator we're claiming that our iterator type realizes the OutputIterator concept.
This becomes very important when you're trying to write algorithms that are as optimized as possible and also generic. For example, performing a sort with an iterator that can be incremented or decremented by an arbitrary value (other than 1 in other words) is more efficient than one that doesn't have this capability. Furthermore, in order to get a new iterator that's X distance from another can require different operations depending on the capabilities of the iterator. To write such an algorithm you use "tag dispatching". To explain this more fully, here's an implementation (untested) of std::advance that works both with iterators that have a += operator and ones that only have a ++ operator and is as fast as possible with both versions.
template < typename RandomAccessIterator >
RandomAccessIterator advance( RandomAccessIterator it
, int amount
, random_access_iterator_tag)
{ return it + amount; }
template < typename ForwardIterator >
ForwardIterator advance(ForwardIterator it, int amount, forward_iterator_tag)
{
for (;amount; --amount) ++it;
return it;
}
template < typename Iterator >
Iterator advance(Iterator it, int amount)
{
typedef typename std::iterator_traits<Iterator>::iterator_tag tag;
advance(it, amount, tag());
}
That's from memory so it's probably riddled with bugs (probably have a bunch of types wrong even)...but that's the idea. The iterator tags are types that are empty and also inherit from each other in exactly the same way as the concepts refine each other. For instance, a random access iterator IS a forward iterator. Thus random_access_iterator_tag is a derivative of forward_iterator_tag. Because of function overload resolution rules passing a random_access_iterator_tag to the function resolves to that version of the function rather than the forward_iterator_tag one.
Again, go read up on generic programming. It's essential to utilizing the full power of C++.
Oh, and finally... The typedef is there in the iterator's class definition because it's a nice, convenient place to put it. A default iterator_traits can the look for it there. You'll want to use iterator_traits rather than that definition though because raw pointers are iterators too and they can't have internal typedefs.

output_iterator_tag is an empty class. Its purpose is to allow algorithms to identify generic classifications of iterators which follow certain rules and provide particular operators. This allows for algorithms to provide a more specialized implementation of a given algorithm under certain conditions.
For example in VS2010's header, "std::distance"'s algorithm has two implementations depending on the 'iterator_category' typedef'd in the iterators passed in.
std::distance only requires an input iterator in order to calculate the distance between two iterators, but it might take linear 'O(n)' time to calculate the answer.
If however, the compiler figures out that a random access iterator is being used and can thereby take advantage of the subtraction operator to calculate the distance in constant time 'O(1)'.
I would recommend watching Stephan T. Lavavej's video where he goes a bit into type traits and their uses in the Standard Template Library.

Related

Why does `std::input_iterator<In>` requires a `value_type`?

I am trying to create a data structure for arrays of dynamic-sized arrays. Multiple choices are possible, the simplest one being std::vector<std::vector<T>>. However it is often not efficient and we would like to compress the data of all the inner vectors into one big vector, and have a vector of offsets to tell where each element begins.
Example:
// encoding of : | 4.,5.,1. | 7.,8.,9.,2 |
std::vector<double> v = {4.,5.,1., 7.,8.,9.,2};
std::vector<int> offsets = {0 , 3 , 7};
Let's encapsulate it ! Consider the following data structure:
(note: the code is neither complete, general or precise, at this point this is just to give an idea of what is going on):
class vblock_vector {
private:
std::vector<double> v;
std::vector<int> offsets;
public:
using iterator = vblock_iterator;
auto begin() -> iterator {
return {v.data(),offsets.data()};
}
auto end() -> iterator {
return {v.data(),offsets.data()+offsets.size()};
}
};
An basic implementation of the iterator type is the following:
struct vblock_iterator {
private:
double* ptr;
int* offsets_ptr;
public:
using reference = span_ref<double>; // see notes (0) and (1)
// using value_type = ???; // See below
auto operator++() {
++offsets_ptr;
return *this;
}
auto operator*() const {
return span_ref<double,int>(ptr+offsets_ptr[0],ptr+offsets_ptr[1]);
}
auto operator<=>(const vblock_iterator&) const = default;
// ... other iterator interface stuff that is trivial
};
This iterator works with e.g. std::copy. (4)
Now let's say that I want to replace my old std::copy calls with std::ranges::copy. For that, vblock_iterator needs to satisfy the std::input_iterator concept. In order to do that, vblock_iterator needs to have an associated value_type (required by the intermediate std::indirectly_readable concept).
An obvious choice would be using value_type = std::vector<double>(2), but I surely don't want to give std::ranges::copy the freedom to use this type at its discretion in its implementation: it would be inefficient.
My question is the following : why does std::input_iterator<In> requires In to have a value_type? At least for copying it is not needed (the fact that I can use std::copy and that it does the right thing proves it). Of course, one can say : "define value_type to be anything, it won't be used by std::range::copy implementations anyway", but then why require it?
I am currently under the impression that value_type is mandatory for e.g. std::swappable, but not for std::input_iterator (nor even std::random_access_iterator dare I say). But the standard committee decided otherwise: what is the reason behind this choice? (3)
Notes:
(0) span_ref is just like a std::span with reference semantics (its operator= is "assign-through" and not "rebind to new array").
(1) In reality, the reference type needs to be a tad more complex to account for offsets, but it is not the subject here. Suffice to say, that it is possible to have an efficient reference type for this structure.
(2) And I think this is the only reasonable choice. At least a container is needed (vector, deque...). E.g. a std::span won't do because if we bother to save the value pointed to by the iterator, it is because we will modify the original memory, and std::span won't help us with that.
(3) In the presentation of the std::indirectly_readable concept (then called Readable), Eric Niebler goes into some detail of why we need value_type to be related in some form to reference to work well with proxy references, but I still don't see why we would would even need value_type for algorithms that don't need to swap elements (or store them somewhere). Yes, there is mathematically a value_type for vblock_iterator, but why require it if it is not meant to be used? (similarly, there is also mathematical operator+= for forward ranges : but since it is inefficient, it is simply not required).
(4) And other algorithms: std::move, std::find, std::find_if, std::any_of, std::partition_point, std::lower_bound, std::unique... So I think that there is something more fundamental going on than: "we are just lucky with std::copy".
std::copy requires a LegacyInputIterator for its iterator types. It does not check this requirement. If you fail to provide a LegacyInputIterator, your program is ill-formed, no diagnostic required.
A LegacyInputIterator requires that std::iterator_traits<X>::value_type exists because it subsumes LegacyIterator.
So your program was ill-formed once you passed it to std::copy. The behavior of your ill-formed program is not determined by the C++ standard in any way; the compiler can legally provide you a program that emails your browser history to your great aunt Eustice and be standard compliant. Or it could do something that happens to align with what you think the program "should" do. Or it could fail to compile.
The std::ranges algorithms have slightly different requirements. These requirements are far more likely to be checked by concepts than the old style algorithms are, telling the user with a compile time error.
You are running into such a case.
To be even more clear, you cannot rely on the implementation of std code to enforce the standard.
These types are required partly to make it easier to talk about the types in question and what operations on them mean, semantically.
Beyond the simple requirements like std::iterator_traits<X>::value_type exist, there are semantic requirements on what *it does, what x = *it++ does, etc. Most of those requirements cannot be checked by the compiler (due to Rice's theorem, they cannot be checked in theory); but the algorithms in the std namespace rely on those semantic meanings being correct for any iterator passed in.
Because the compiler can assume the semantic meanings are correct, the algorithms can be cleaner, simpler and faster than if they had to check them. And it means that multiple different compiler vendors can write different std algorithm implementations, improving the algorithm over each other, and there is an objective standard to argue against.
For a LegacyInputIterator and types value_type and reference from std::iterator_traits<X>, we must have:
value_type v = *it;
is a valid expression, *it must return reference, and
*it++
must return a type convertible to value_type.
Not every algorithm need use every property of every iterator it requires that iterator to have. The goal here is to have semantically meaningful categories that do not demand too much in the way of overhead.
Requiring that an iterator over stuff actually have a type it is an iterator over is not a large overhead. And it makes talking about that the iterator is insanely easier.
You could refactor it and remove that concept, or cut the concept up into smaller pieces so that the value_type is only required in the narrow cases where it is required, but that would make the concepts harder to write about and harder to understand.

Why does ITER_CONCEPT befog iterator_traits::iterator_concept/category?

According to the [iterator.concepts.general]:
Otherwise, if iterator_traits names a specialization generated from the primary template, then
ITER_CONCEPT(I) denotes random_access_iterator_tag.
ITER_CONCEPT(I) may be std::random_access_iterator_tag,
even if I models input_iterator.
What is the purpose of this?
Why does c++ 20 iterator concept(such as std::forward_iterator) still checks whether the ITER_CONCEPT is derived from a certain tag?
Example:
template<class I>
concept forward_iterator =
input_iterator<I> &&
derived_from<ITER_CONCEPT(I), forward_iterator_tag> && //is this necessary?
incrementable<I> &&
sentinel_for<I, I>;
To be an X iterator, the iterator concept requires (among other things) two things. It is possible for a type to accidentally fulfill an iterator's syntactic requirements without intending to. As such, to be an X iterator, the prospective iterator type must declare that it intends to be (at least) an X iterator. This is done through the tag type, as retrieved through ITER_CONCEPT.
But iterators have an "inheritance" graph of sorts. All random access iterators are also bidirectional. So each iterator concept requires that the prospective iterator type fulfill the concept requirements of its "base" iterator concept.
These are separate requirements. Types can lie, after all, and the whole point of concepts is to keep types from lying. Just because a type claims to be a forward iterator does not mean it actually is one.
As for the purpose of this particular rule in ITER_CONCEPT, it is to bypass the defaulting mechanism defined for the legacy iterator_category. The idea is that if you haven't specified your iterator's tag type explicitly, then ITER_CONCEPT will assume the (second) most permissive one and then use C++20's concept checks to see what your iterator actually supports.
It doesn't mean that this iterator is definitely a random access iterator; it's saying "try (almost) anything, and if the concept doesn't fit, then that isn't it".

Why don't almost all the type aliases in std::iterator_traits have defaults?

When creating a new iterator pre-C++20 without the help of libraries like boost.iterator, it's necessary to specify the type aliases difference_type, value_type, pointer, reference and iterator_category.
According to cppreference, with C++20, it's only necessary to specify difference_type and value_type, which I think is great!
But why are there defaults for exactly these 3 aliases?
There are 2 things I don't understand about this (and one thing that seems to me like an oversight):
Why are there no default values for value_type and difference_type? Wouldn't it make sense to use something like std::remove_reference_t<reference> as a default for value_type?
As a default for difference_type for random access iterators, it could arguably make sense to use the result type of the - operator taking two iterators.
C++20 adds the contiguous_iterator_tag. Just like with input_iterator_tag versus forward_iterator_tag, I don't see how it should be possible for the compiler to correctly distinguish between a contiguous iterator and a random access iterator, which I guess is why it apparently never selects contiguous_iterator_tag. Is this intended? It also seems somewhat dangerous to misclassify an input iterator as a forward iterator, so why don't require the programmers to specify this alias themselves?
On a somewhat unrelated note, I'm not sure if it's a good idea to silently generate a value for iterator_category even if the programmer has explicitly stated another category, and generating a value for iterator_category that's completely different from the concept seems strange as well. Consider this unrealistic example:
#include <iostream>
#include <iterator>
// With the == operator, this is an input iterator, but nothing else.
struct WeirdIterator {
// Not an output iterator because you can't assign to a const reference
const int& operator*() const { return 42; }
WeirdIterator& operator++() { return *this; } // unimportant
WeirdIterator operator++(int) { return *this; } // unimportant
// bool operator==(const WeirdIterator&) const = default;
using iterator_category = std::random_access_iterator_tag;
using value_type = int;
using difference_type = int;
};
void iteratorConcept(std::input_iterator auto) {
std::cout << "input iterator concept" << std::endl;
}
void iteratorConcept(std::random_access_iterator auto) {
std::cout << "random access iterator concept" << std::endl;
}
void iteratorTag(std::output_iterator_tag) {
std::cout << "output iterator tag" << std::endl;
}
void iteratorTag(std::input_iterator_tag) {
std::cout << "input iterator tag" << std::endl;
}
void iteratorTag(std::random_access_iterator_tag) {
std::cout << "random access iterator tag" << std::endl;
}
int main() {
WeirdIterator iter;
iteratorConcept(iter);
iteratorTag(std::iterator_traits<WeirdIterator>::iterator_category{});
return 0;
}
This prints "input iterator concept" and "output iterator tag" because it's missing the comparison operator (which isn't required for the concept).
If I add the commented line, this now prints "input iterator concept" and "random access iterator tag", even though it clearly isn't a random access iterator. To be fair, writing the wrong iterator_category (i.e. random_access_iterator_tag) like this is a pretty stupid example, but I still think it would make sense to check if the concept is satisfied, especially in the case of the "fall-back" output_iterator_tag: Forgetting to write the == operator shouldn't turn an input iterator into an unusable output iterator. Would it be possible and make sense to check that the corresponding concepts are satisfied?
Edit
A few points in my question seem to be unclear, or maybe I've made some incorrect but unstated assumptions. I'll try to be more explicit about them and rephrase my current understanding (after reading the answer by Nicol Bolas):
Regarding Point 3: As I understand it, it's possible that a type T may have some std::iterator_traits<T>::iterator_category alias even if it doesn't model the corresponding C++20 concept or the C++17 named requirement. This is intended. So, let's forget about this, because it's probably a better fit for a separate question.
I think that the std::type_traits aliases defined if I don't explicitly write them down (e.g. reference when I only write value_type) can be incorrect for some iterators and are meant as sensible default values. Is this correct? If this is incorrect, my question is pretty much answered.
If T::reference isn't defined for an input iterator T, then std::iterator_traits::reference is defined as decltype(*std::declval<T&>()). Is this correct?
If reference can be defined based on operator*, wouldn't it make sense to then also define value_type based on *? Assuming that 5. is correct,
the only input iterator I can think of where this would go wrong is the iterator from std::vector<bool>, and there were several proposals to deprecate it because of this difference. So most input iterators would work with this definition, and those that didn't could simply specify value_type. Am I missing something?
Regarding Point 2: It's not in general decidable into what category an iterator falls.
Using e.g. an input iterator as if it were a more general forward iterator would be a bug. It can happen that the type_traits::iterator_category of a valid iterator where the programmer did not specify the iterator_category is incorrect. This doesn't affect the concept or named requirement (they take semantics into account), but in practical terms, it's possible that stl functions don't work correctly with this iterator, without generating a (run- or compile-time) error. Therefore, I think it would be a good idea to require the programmer to explicitly state the category. Is there a problem in this reasoning or did miss something?
I hope I don't come across as overly pedantic or as insisting on my personal opinion, but I genuinely don't know if and where there's an error in the points above, and I'm guessing that this isn't just confusing to me.
It's important to understand something at this point, as certain different things are being conflated here.
In C++20, there are two classifications of iterators: the old C++17 named requirements, and the new C++20 concept-based iterators. Most of the old requirements map to the latter, but the concept requirements allow for more things to be considered iterators than what the C++17 requirements allowed.
std::iterator_traits however is used for both of them, since they do use many of the same moving parts. The point of this is that it should be possible to write an iterator that fulfills both the C++17 named requirement and the similar C++20 concept. That is, you can write a type that satisfies Cpp17RandomAccessIterator and std::random_access_iterator without too much trouble.
I bring this up because many of the things under discussion will matter a lot more to one set of requirements than the other.
Why are there no default values for value_type and difference_type? Wouldn't it make sense to use something like std::remove_reference_t<reference> as a default for value_type?
Obviously, that would require you to specify reference. So you'd still have to specify two things. value_type is the one that the creator of the iterator is thinking in terms of anyway. And if they're thinking of it, it's probably because reference needs to be something other than a value_type&, so they'll need to specify both anyway.
C++20 adds the contiguous_iterator_tag. Just like with input_iterator_tag versus forward_iterator_tag, I don't see how it should be possible for the compiler to correctly distinguish between a contiguous iterator and a random access iterator, which I guess is why it apparently never selects contiguous_iterator_tag. Is this intended?
In C++17, there was no such thing as a "contiguous iterator". Not in the same sense as a RandomAccessIterator. There's a whole section in the standard that explains the requirements of a RandomAccessIterator, while "contiguous iterator" gets a one paragraph statement with no additional information about it and very few actual uses.
And of course, "contiguous iterator" gets no iterator tag. This was done deliberately to avoid adding another iterator tag and possibly making a lot of code that could work non-functional because a contiguous iterator instead advertised itself as random access.
C++20 changes things. It adds a std::contiguous_iterator_tag, but it does so because std::contiguous_iterator now has syntactical differences from std::random_access_iterator. Namely, a contiguous iterator must permit conversion into a pointer to its value_type via std::to_pointer. This allows you to turn an iterator pair into a pointer pair without having to dereference a potentially non-dereference-able iterator (such as a past-the-end iterator).
Note also that automatic assignment of iterator categories is based on satisfying the C++17 named requirements, not of the C++20 concepts. Since there is no "contiguous iterator" named requirement (and even if there was, it wouldn't be syntactically determinable), there can be no auto assignment of it.
The reason automatic assignment only works for the C++17 requirements is because the C++20 concepts are defined in terms of std::iterator_traits. So it cannot use the concepts without creating a circular definition.
On a somewhat unrelated note, I'm not sure if it's a good idea to silently generate a value for iterator_category even if the programmer has explicitly stated another category
That's not what the standard does. It only provides one if you don't specify one (outside of one odd quirk mentioned below).
This prints "input iterator concept" and "output iterator tag" because it's missing the comparison operator (which isn't required for the concept).
This is an odd quirk of the new definition of iterator_category, but the quirk does ultimately correctly represent the incoherence of your type.
The primary template iterator_category has 3 possible versions, depending on how you defined your iterator type. If your iterator provides all of the member type alises except pointer, then it just uses them. If it only provides some of them, then it does a concept check against an exposition-only version of Cpp17InputIterator. If your type fits that, then it uses your type's iterator_category (and if you don't provide one, then it computes one).
However, if your iterator isn't an input iterator, then it checks against the basic Cpp17Iterator. If that fits, then iterator_traits::iterator_category is fixed to be output_iterator_tag. That is certainly a strange choice.
If I add the commented line, this now prints "input iterator concept" and "random access iterator tag", even though it clearly isn't a random access iterator.
But you said it was a random access iterator. The system isn't supposed to override what you said; that was just a quirk of what happens if your type doesn't match input-iterator but still happens to be some kind of iterator.
In any case, if you lie, you lied. Garbage in, garbage out.
I still think it would make sense to check if the concept is satisfied, especially in the case of the "fall-back" output_iterator_tag: Forgetting to write the == operator shouldn't turn an input iterator into an unusable output iterator.
But... that's what it is. Equality testing isn't optional for input iterators. If you can't test it for equality, then it not an input iterator. Indeed, if the system did as you suggested, that's exactly the tag you would get: an output iterator.
So what's your problem? If you accidentally failed to make your type an input iterator, do you want the system to correctly categorize it as what it is in accord with its behavior or do you want it to forward your mistaken category onward?

Canonical way to define forward output iterator

How does one define forward-output-iterators in C++11 in a canonical way?
According to the standard a forward_iterator is only a input_iterator. So the corresponding forward_iterator_tag only extends input_iterator_tag. If we are using std::iterator to define our iterators, what tag do we use for a forward-output-iterator?
Is it canonical to define a private tag that extends both forward_iterator_tag and output_iterator_tag or is there a better solution?
The canonical thing to do is to inherit from std::iterator<std::forward_iterator_tag, T> only. Iterators have only one category.
The standard has no algorithms (or other uses) for an output iterator that is also a forward iterator. All uses of output iterators in the standard require only single-pass.
Instead, the standard has the idea of mutable vs. immutable iterators of categories forward/bidi/randomaccess. All the algorithms that need to write through iterators, and that require better than single-pass, also read through the same iterators they write through. This is std::remove, std::sort and other mutating algorithms.
The difference between mutable and immutable iterators is not detected by iterator tag, it's determined by whether the assignment expressions are well-formed. So for example if you pass an iterator to std::sort that's immutable, then the algorithm won't compile anyway, so there's generally no need for an input iterator to also be tagged with output_iterator_tag. All algorithms that require an OutputIterator will Just Work with a mutable ForwardIterator, again there is no need for it to be tagged with output_iterator_tag.
If you have different needs from those of the standard algorithms then I can't immediately think of a reason that your proposal won't work for your iterators. But it won't detect mutable standard iterators. For example std::deque<int>::iterator and int* have iterator category random_access_iterator_tag, not your private tag and not anything to do with output_iterator_tag. So you would probably be better off defining your own traits class rather than hoping to adapt the existing iterator_traits::iterator_category to provide the information you want.

AnyIterator and boost iterator facade

Is it possible to implement an any iterator with boost iterator facade?
I don't want to define implementation details in my baseclass
class Base
{
public:
typedef std::vector<int>::iterator iterator;//implementation detail
...
virtual iterator begin()=0;
virtual iterator end()=0;
};
or do i have to write one completely from scratch;
The code you've posted has fixed the type of iterators returned from Base and all it's implementantions to std::vector<int>::iterator which is probably not what you want. Jeremiah's suggestion is one way to go with one drawback: you loose compatibility with STL... I know of three implementations of a polymorphic iterator wrapper:
becker's any_iterator (which implements boost::iterator_facade)
the opaque_iterator library (google for it), or
Adobe's very interesting poly library which contains a hierarchy of STL conforming any_iterators.
The problem is harder than it might seem... I made an attempt myself mainly because I needed covariance in any_iterators type argument (any_iterator<Derived> should be automatically convertible to any_iterator<Base>) which is difficult to implement cleanly with STL like iterators. A C# like Enumerator<T> is easier to implement(*) (and imho generally a cleaner concept than STL-like pairs of iterators) but again, you "loose" the STL.
(*) = without 'yield' of course :-)
I think this may be what you're looking for:
any_iterator: Type Erasure for C++ Iterators
Here's a snippet from that page::
Overview
The class template any_iterator is the analog to boost::function for
iterators. It allows you to have a single variable and assign to it
iterators of different types, as long as these iterators have a
suitable commonality.