I'm trying to figure out exactly what requirements are made on forward_iterators' reference types. In the obvious cases you'll have value_type = T; and reference = T&;. Reading the cppreference page on forward iterator requirements, I saw
Expression Return Equivalent expression
*i++ reference value_type& temp=*i; ++i; return temp;
std::vector<bool> shows that the "equivalent expression" isn't always valid since it returns a proxy object:
std::vector<bool> v(10);
auto i = v.begin();
std::vector<bool>::iterator::value_type& temp = *i; // error
// can't bind bool& to std::_Bit_reference
The equivalent expression isn't mentioned in the standard that I saw. The proxy object allows assignment though, which might be the key to conformance.
Outside of just all around trying to nail down the requirements, my specific question concerns knowing whether or not having value_type == reference where neither is a reference or supports assignment, would work with the standard libraries.
Would some Container<int> with an iterator tagged as forward_iterator_tag and reference == int be valid?
The requirements are enumerated in [forward.iterators]:
A class or pointer type X satisfies the requirements of a forward iterator if
X satisfies the requirements of an input iterator (24.2.3),
X satisfies the DefaultConstructible requirements (17.6.3.1),
if X is a mutable iterator, reference is a reference to T; if X is a const iterator, reference is a reference
to const T,
[...]
So if your container has reference == int, then it does not meet the requirements of forward iterator. Which I suppose technically makes vector<bool>::iterator just an input iterator, even though it's tagged as a random access iterator.
Related
Let we have an input iterator it that conforms Cpp17InputIterator. Can we guarantee that the reference value *it remains the same after we do it++? For example,
const auto &old_ref = *it;
auto old_val = ref;
++it; // old_ref might be affected by this
assert(old_ref == old_val); // Is this guaranteed for Cpp17InputIterator?
This table says that the old copies of it are not required to be dereferenceable. But does this imply that old references obtained from it may not be dereferenceable too? Can *it return a reference to the iterator's internal state?
It should never be assumed that a reference remains valid if an iterator is invalidated. This may be the case with some iterator implementations, but doing so is a violation of the iterator concept, and will not work generically with all iterators.
It's entirely legal for an iterator to be implemented internally as a std::optional<T> that may return a reference of T and reconstructs T between each iteration. This is especially true on Inputiterators, which don't require multipass support (such as for a generator range).
For example, an iterator doing the following is completely legal:
template <typename T>
auto some_special_iterator<T>::operator*() -> T&
{
return *m_value; // returns a reference to the currently stored T
}
template <typename T>
auto some_special_iterator<T>::operator++() -> some_special_iterator&
{
m_value.clear(); // Destroys the object which someone may be holding a reference to
m_value.emplace( ... ); // Invalidates any existing references by constructing a new object
return (*this);
}
Using a pointer or reference to an object that has been destroyed is undefined behavior, even if that pointer or reference points to the same storage of a new object. The only legal pointers or references to a newly constructed object are the ones returned by new (such as placement new) or std::launder.
Since this is tagged language-lawyer: There isn't really much in the line of quotes directly from the standard that can define this to be illegal, since there are no guarantees from the InputIterator concept that allow a preserved reference to remain valid.
So to prove this is undefined behavior, we need to work backwards. First off:
From defns.undefined
Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data
So we need to check that the iterator example above adheres correctly to InputIterator's concept definition, of which the important part is the operator++ behavior:
From the table in input.iterators:
Requires: r is dereferenceable.
Postconditions: r is dereferenceable or r is past-the-end;
any copies of the previous value of r are no longer required either to be dereferenceable or to be in the domain of ==.
(emphasis mine)
In the above requirements, the condition that an iterator r is dereferencable before is upheld in the above example, just as the postcondition would be upheld as well.
What's interesting is the part that I bolded: "any copies of the previous value of r are no longer required either to be dereferenceable or to be in the domain of =="
This means that any existing copies of the iterator itself may no longer be dereferenceable and nor may it properly perform a comparison in the same range as another iterator. This is the part that formally means that all copies of the iterator may have been invalidated (note: "may", since an iterator need not be invalidated -- but should be assumed to have).
The C++ Standard's document does not explicitly state that any held references are still valid, because this is not defined behavior of the process; however if the iterator itself is no longer considered "dereferencable" after an invocation of operator++, then it should also be assumed that its reference is no longer valid. Since the wording does not state that holding a reference after this point is guaranteed to remain valid, it must be assumed to be undefined behavior due to the passage above from defns.undefined.
The example illustrated above is a conforming iterator implementation where such an expectation would cause actual undefined behavior, which fits this interpretation.
On a different note, be careful using const auto& with input iterators.
operator*() on input iterators only needs to return an It::reference type which is convertible to T; it does not actually need to be a reference at all. Be aware that this is strengthened in forward iterators and beyond that "reference must be a reference to T", but this isn't true of input iterators.
Using const auto& here may actually cause you to unintentionally const-lifetime-extend a temporary proxy object rather than holding a real reference.
I have trouble finding the semantics of the reference type trait of an iterator. Let's say I want to implement a chunk iterator, that, given a position into a range, will give me chunks of that range:
template<class T, int N>
class chunk_iterator {
public:
using reference = std::span<T,N>;
chunk_iterator(T* ptr): ptr(ptr) {}
chunk_iterator operator++() { ptr += N; return *this; }
reference operator*() const { return {ptr,N}; }
private:
T* ptr;
};
The problem that I see here is that std::span is a view-like thing, but it does not behave like a reference (say a std::array<T,N>& in this case). In particular, if I assign to a span, the assignement is shallow, it will not copy the value.
Is std::span a valid iterator::reference type? Are view and reference semantics explained in detail somewhere?
What should I do to solve my problem? Implement a span_ref with proper reference semantics? It it already implemented in some library? Is a non-native reference type even allowed?
(note: solving the problem by storing a std::array<T,N> and returning a std::array<T,N>& in operator* is doable, but ugly, and if N is not known at compile time, storing instead a std::vector<T> with dynamic memory allocation is just plain wrong)
When talking about standard-compliant iterators, it depends on several things.
For conforming Iterators, it almost doesn't matter what the reference type is because the standard does not require any usage semantics for the reference type. But that also means nobody except you knows how to use your iterator.
For conforming Input Iterators, the reference type must meet the semantics specified. Notice that for LegacyInputIterator, the expression *it must be a reference that is usable as a reference with all the normal semantics, otherwise code that uses your iterator will not behave as expected. This means reading from a reference is akin to reading from a built-in reference. In particular, the following should do "normal" things:
auto value = *itr; // this should read a value
In this situation, a view type like span wouldn't work because span is more like a pointer than a reference: in the above snippet value would be a span, not whatever the span refers to.
For conforming Output Iterators, the reference type has no requirements. In fact, standard LegacyOutputIterators like std::back_insert_iterator have void as a reference type.
For conforming Forward Iterators and above, the standard actually requires the reference be a built-in reference. This is to support uses like below:
auto& ref = *itr;
auto ptr = &ref; // this must create a pointer pointing to the original object
auto ref2 = *ptr; // this must create a second, equivalent reference
auto other = std::move( ref ); // this must do a "move", which may be the same as a copy
ref = other; // this must assign "other"'s value back into the referred-to object
If the above didn't work correctly, many of the standard algorithms wouldn't be possible to write generically.
Speaking to span specifically, it acts more like a pointer than a reference logically. It can be re-assigned to point to something else. Taking its address creates a pointer to the span, not a pointer to the container being spanned over. Calling std::move on a span copies the span, and doesn't move the contents of the spanned range. A built-in reference T& will only refer to one thing ever once it's been created.
Creating a non-conforming reference that actually works with standard algorithms would involve a family of types overloading operator*, operator->, and operator&, operator=, and std::move, and modeling pointers, lvalue references, and rvalue references.
The meaning of an iterator's reference type cannot be understood without comprehending its relationship to the iterator's value_type. An iterator is a construct that represents a position within a sequence of value_types. A reference is a mediator within this paradigm; it is a thing that acts like a value_type (const) &. Until you figure out what your value_type is going to be, you can't decide what your reference will need to look like.
What "acts like" means depends on what kind of iterator we're talking about.
For C++11, the InputIterator category requires that reference be a type which is implicitly convertible to a value_type. For the OutputIterator category, reference is required to be a type which is assignable from a value_type.
For all of the more restricted iterator categories (ForwardIterator and above), reference is required to be exactly one of value_type & (if you can write to the sequence) or value_type const & (if you can only read from the sequence).
Iterators where reference is not a value_type (const) & are often called proxy iterators, as the reference type typically acts as a "proxy" for the actual data stored in the sequence (assuming the iterator isn't just inventing values to begin with). Proxy iterators are often used for cases where the iterator doesn't iterate over a range of actual value_types, but simply pretends to. This could be the bitwise iterators of vector<bool> or an iterator that iterates over the sequence of integers on some half-open range [0, N).
But proxy iterator references have to act like language references to one degree or another. InputIterator references have to be implicitly convertible to the value_type. span<T, N> is not implicitly convertible to array<T, N> or any other container type that would be appropriate for a value_type. OutputIterator references have to be assignable from value_type. And while span<T, N> may be assignable from an array<T, N>, the assignment operation doesn't have the same meaning. To assign to an OutputIterator's reference ought to change the values stored within the sequence. And this doesn't.
In any case, you first need to invent a value_type that does what you need it to do. Then you need to build a proper reference type that acts like a reference. Lastly... well, you can't make your iterator a ForwardIterator or higher, because C++11 doesn't support proxy iterators of the most useful iterator categories. C++20's new formulation of iterators allows proxy iterators for anything that isn't a contiguous_iterator.
Recently I came across this code in my codebase (Simplified for here, of course)
auto toDelete = std::make_shared<std::string>("FooBar");
std::vector<decltype(toDelete)> myVec{toDelete};
auto iter = std::find_if(std::begin(myVec), std::end(myVec),
[](const decltype(toDelete) _next)
{
return *_next == "FooBar";
});
if (iter != std::end(myVec))
{
std::shared_ptr<std::string> deletedString = iter[0];
std::cout << *deletedString;
myVec.erase(iter);
}
Online Example
Now, I noticed that here we are accessing an iterator by indexing!
std::shared_ptr<std::string> deletedString = iter[0];
I've never seen anyone access an iterator by indexing before, so all I can guess at is that the iterator gets treated like a pointer, and then we access the first element pointed to at the pointer. So is that code actually equivalent to:
std::shared_ptr<std::string> deletedString = *iter;
Or is it Undefined Behavior?
From the cppreference documentation for RandomAccessIterator:
Expression: i[n]
Operational semantics: *(i+n)
Since a std::vector's iterators meet the requirements of RandomAccessIterator, indexing them is equivalent to addition and dereferencing, like an ordinary pointer. iter[0] is equivalent to *(iter+0), or *iter.
This is Standard conforming behavior
24.2.7 Random access iterators [random.access.iterators]
1 A class or pointer type X satisfies the requirements of a random access iterator
if, in addition to satisfying the requirements for bidirectional
iterators, the following expressions are valid as shown in Table 118.
a[n] convertible to reference: *(a + n)
Note that it is not required that the particular iterator is implemented as a pointer. Any iterator class with an overloaded operator[], operator* and operator+ with the above semantics will work. For std::vector, the iterator category is random access iterator, and it it required to work.
Is decltype(*it) the value type of the iterator, or an lvalue reference to that, or something else?
I think it is an lvalue reference, because *it is an lvalue, but I'm not sure.
Note: In my case, it is a BidirectionalIterator, but feel free to answer the general case.
*it is not necessarily an lvalue. Only forward iterators have that requirement.
Iterators (§24.2.2) are required to have *it be a valid expression that returns iterator_traits<Iterator>::reference (and other irrelevant things). Nothing else is said about this and reference does not have to be a reference type†.
Input iterators (§24.2.3) are required to have *it be a valid expression that returns something convertible to the value type.
Forward iterators, however, have the following requirement (§24.2.5 paragraph 1):
— if X is a mutable iterator, reference is a reference to T; if X is a const iterator, reference is a reference to const T,
(here T is the iterator's value type)
This requires *it to be a reference, which means it has to be a glvalue (i.e. cannot be a prvalue but can be an xvalue like it is the case with move iterators).
The higher iterator categories do not add any relevant requirements.
† reference is defined to be the type of *it which makes it a bit of a circular definition, but poses no restrictions.
*it is most assuredly not guaranteed to be an lvalue. Input iterators may return an rvalue.
The following code compiles just fine, overwriting the values in v2 with those from v1:
std::vector<int> v1 = {1, 2, 3, 4, 5};
std::vector<int> v2 = {6, 7, 8, 9, 10};
std::copy(v1.begin(), v1.end(), v2.begin());
The third argument of std::copy is an OutputIterator. However, the Container requirements specify that a.begin(), where a is a Container object, should have a return type of iterator which is defined as:
any iterator category that meets the forward iterator requirements.
Forward iterator requirements do not include the requirements of output iterators, so is the example above undefined? I'm using the iterator as an output iterator even though there's no obvious guarantee that it will be one.
I'm fairly certain the above code is valid, however, so my guess is that you can infer from the details about containers that the forward iterator returned by begin() will in fact also support the output iterator requirements. In that case, when does begin() not return an output iterator? Only when the container is const or are there other situations?
Forward iterators can conform the the specifications of an output iterator if they're mutable, depending on the type of the sequence. It's not explicitly spelled out (unlike the fact that they to input iterator requirements), but if we take a look at the requirements table
we can go and check if a given forward iterator conforms to them:
*r = o
(§24.2.5/1): if X is a mutable iterator, reference is a reference to T
A mutable reference is assignable (unless you have a non-assignable type, obviously).
++r, r++, *r++ = o
(§24.2.5 Table 109)
The first line in Table 109 is the same requirement as for output iterators, except that forward iterators don't have the remark. The second line is more restrictive than for output iterators, since it specifies that a reference must be returned.
Bottom line, if you have a mutable forward iterator into a sequence of copy-assignable types, you have a valid output iterator.
(Technically, a constant iterator into a sequence of types that have a operator=(...) const and mutable members would also qualify, but let's hope nobody does something like that.)
Forward iterator requirements do not include the requirements of output iterators
This sounds backwards. OutputIterators need to satisfy fewer criteria than ForwardIterators.
(Forward iterators should be reusable after increment, i.e. incrementing them twice should yield the same result).
Therefore, it is ok provided that the output iterator stays valid until the algorithm completes. IOW:
auto outit = std::begin(v2);
std::advance(outit, v1.size()); // or: std::distance(std::begin(v1), std::end(v2))
// outit should still be valid here
Edit To the comment:
§ 24.2.1
Iterators that further satisfy the requirements of output iterators are called mutable iterators. Nonmutable iterators are referred to as constant iterators.
Now, let me find the bit that ties this together saying vector::begin() returns mutable Random Access iterator.
For info
§ 24.2.5 Forward iterators [forward.iterators]
1 A class or pointer type X satisfies the requirements of a forward iterator if
X satisfies the requirements of an input iterator (24.2.3),
X satisfies the DefaultConstructible requirements (17.6.3.1),
if X is a mutable iterator, reference is a reference to T; if X is a const iterator, reference is a reference to const T,
the expressions in Table 109 are valid and have the indicated semantics, and
objects of type X offer the multi-pass guarantee, described below.