What is multi-pass guarantee per C++ ISO standard? - c++

Reading Working Draft N3337-1, Standard for Programming Language C++, 24.2.5 Forward iterators, page 806.
From draft:
Two dereferenceable iterators a and b of type X offer the multi-pass guarantee if:
— a == b implies ++a == ++b and
— X is a pointer type or the expression (void)++X(a), *a is equivalent to the expression *a.
[ Note: The requirement that a == b implies ++a == ++b (which is not true for input and output iterators) and the removal of the restrictions on the number of the assignments through a mutable iterator (which applies to output iterators) allows the use of multi-pass one-directional algorithms with forward iterators.
—end note ]
Could someone re-interpret this in easier terms ? I understand that Forward iterators are multi-pass, but I don't understand how this is accomplished per C++ standard requirements.

The terms states it all, I'd think: you can pass through the sequence multiple times and remember positions within the sequence. As long as the sequence doesn't change, starting at a specific position (iterator) you'll traverse over the same objects as often as you want in the same order. However, you can only go forward, there is no way to move backwards. The canonical example of a sequence like this is a singly-linked list.
The quoted clause basically says, that if you have two iterators comparing equal and you increment each one of them, you get to the same position and they compare equal again:
if (it1 == it2) {
++it1;
++it2;
assert(it1 == it2); // has to hold for multi-pass sequences
}
The somewhat weird expression ++X(a), *a is basically intended to advance an iterator independent to a and the requirement that ++X(a), *a being equivalent to *a basically means that iterator over the sequence using an independent iterator doesn't change what a refers to. This is unlike input iterator where ++InIt(a), *a is not necessarily equivalent to *a as the first expression can have change the position, possibly invalidating a and/or change the value it is referring to.
By contrast, the single-pass sequence (input and output iterations in standard terms) can only be traversed once: trying to traverse the sequence multiple times will not work necessarily work. The canonical example of sequences like this are input from the keyboard and output to the console: once read, you can't get back the same characters again and once sent you can't undo the characters.

Related

std::unordered_set::equal_range iterator question

std::unordered_set::equal_range returns a pair of iterators describing the range of values in the set where the keys for the values compare as equal. Given:
auto iteratorFromEqualRange = someUnorderedSet.equal_range(key).first;
auto iteratoFromFind = someUnorderedSet.find(key);
is it guaranteed by the Standard that:
++iteratorFromEqualRange == ++iteratorFromFind;
as they are both defined in terms of std::unordered_set::iterator? In other words, can a different implementation of std::unordered_set keep "hidden" information about the context of what we're iterating, or is this a not-very-subtle enforcement of the bucket interface (which limits our implementation options)?
I expect that this is indeed a guarantee, given the requirements of LegacyForwardIterator, I'm just asking for confirmation (or better news that includes some kind of escape hatch)
The iterator of unordered_set is a Forward Iterator (now named LegacyForwardIterator).
The C++14 standard (final draft n4140) states this regarding Forward Iterators:
24.2.5 Forward iterators [forward.iterators]
1 A class or pointer type X satisfies the requirements of a forward iterator if
...
(1.5) — objects of type X offer the multi-pass guarantee, described below.
...
3 Two dereferenceable iterators a and b of type X offer the multi-pass guarantee if:
(3.1) — a == b implies ++a == ++b and
(3.2) — X is a pointer type or the expression (void)++X(a), *a is equivalent to the expression *a.
Combining (1.5) and (3.1) in this case would mean that ++iteratorFromEqualRange == ++iteratorFromFind; is guaranteed by the standard, provided both these iterators can be dereferenced.

Meaning of terms identical, equal, equivalent in the Standard

There are at least three terms with similar meaning in the Standard: identical, equal and equivalent. All these used when algorithms described. Say, std::adjacent_find:
Searches the range [first, last) for two consecutive identical elements.
But description of the comparator says:
binary predicate which returns ​true if the elements should be treated as equal
When it comes to associative containers, the word equivalent is used. For two elements a and b it means (roughly) !(a < b) && !(b < a). While equal means a == b.
What does the term identical mean? Is it defined in the Standard?
There is no definition of "identical" that I could find in the relevant sections of the standard. Looks like a colloquial use of the word. Which is further supported by the fact your quote is from cppreference. The normative definition of adjacent_find in the standard is specified in terms of ==
(or a predicate) directly:
Returns: The first iterator i such that both i and i + 1 are in the range [first, last) for which the following corresponding conditions
hold: *i == *(i + 1), pred(*i, *(i + 1)) != false. Returns last if no
such iterator is found.
While cppreference is an invaluable resource, its goal is to digest the standard text into easily accessible and understandable materiel. Sometimes, it will make colloquial use of words for an intuitive explanation. This is one such case.

Does the C++ standard require operator != must be provided for a given iterator type?

The C++17 standard 27.2.1.8 says:
An iterator j is called reachable from an iterator i if and only if
there is a finite sequence of applications of the expression ++i that
makes i == j.
That is to say, any conforming iterator type must provide operator ==.
However, I find nothing about operator != is a requirement for iterator types.
Does the C++ standard require operator != must be provided for a given iterator type?
See C++17 [input.iterators]/2 Table 95 "Input iterator requirements".
Input iterators require that a != b is valid and behaves the same as !(a == b) if the latter is valid. Link to cppreference.com summary
Output iterators do not need to support either operation.

Alternatives or enhancements to std::find() on a native c++ pointer (such as uint16 *)

I am trying to come up with a single line C++ conditional check to check the presence of a value within a buffer. (like the if 'value' in list: check in python)
std::find seemed to be the right fit here.
char *buf = "01020304";
int buf_len = 8
uint16_t *ptr = std::find((uint16_t *) buf, (uint16_t *) buf + 3, (uint16_t)13360); // 13360 corresponds to 2 bytes in "04"
std::cout <<"\n(ptr-buf):"<<(ptr-(uint16_t *) buf);//returns 3 for any value, even if 0 is passed instead of 13360 since the last 2 bytes are not searched (buf+6, buf +7))
std::find seems to search in the [first, last) range of a buffer. This makes sense for STL types like vector. However, for raw pointers, the only way to make std::find() look at the last element of buf involves passing an offset beyond buf (eg. buf+8).
Is this a safe thing to do? I am guessing not. Is there a better alternative to using std::find to do a single-line exists check on a char *buffer?
When using pointers with STL algorithms, it is valid and correct to use a 'one past the end' pointer as the end iterator for a range. It is not valid to dereference a pointer one past the end of an array but you can test against one and perform certain arithmetic operations with one.
The standard specifies that it is valid to perform comparisons on a 'one past the end' pointer, section 5.9.3, 'Relational Operators':
5.9.3 Comparing pointers to objects is defined as follows:
(3.1) — If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
(3.2) — If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
It also specifies addition and subtraction on pointers such that 'one past the end' pointers behave correctly in algorithms like std::distance(), std::advance(), etc.
In addition the standard has this to say regarding STL iterators:
24.1.6 Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. ...
Past the end pointers are standard C too. E.g. it's safe to produce them with arithmetic: from C11 standard, 6.5.6 (additive operators) paragraph 8:
... If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.

What is std::string(itr, itr) supposed to do?

The web site cplusplus documentation for std::string constructor taking two input iterators states in part:
Copies the sequence of characters in the range [first,last), in the same order.
first, last:
Input iterators to the initial and final positions in a range. The range used is [first,last), which includes all the characters between first and last, including the character pointed by first but not the character pointed by last.
What does this mean in the degenerate case where first == last? On the one hand first is included and on the other last is excluded? What does the official C++ standard say should happen in this case? Should an exception be thrown?
I don't know what documentation it is you're reading, but the standard says (§21.4.2/15):
[..] constructs a string from the values in the range [begin, end), as indicated in the Sequence Requirements table
And the Sequence requirements table (Table 100) defines X a(i, j) for a valid range [i, j) as:
Constructs a sequence container equal to the range [i, j)
A range is valid when the second iterator is reachable from the first (through incrementing). For two iterators that are equal, the range is empty. See §24.2.1/7:
A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is an empty range; in general, a range [i,j) refers to the elements in the data structure starting with the element pointed to by i and up to but not including the element pointed to by j. Range [i,j) is valid if and only if j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
So if first == last, as you say, you will get an empty string. If last is not reachable from first, you have undefined behaviour.
The range is empty, so there's nothing to copy. The result is an empty string.
What does this mean in the degenerate case where first == last?
It means that the input range is empty, so the string will be empty.
What does the standard say should happen in this case?
C++11 24.2.1/7 says:
A range [i,i) is an empty range
One thing I've used it for several times is constructing a std::string from a substring in a large C-style string (const char*). You can pass it two const char* pointers and it will construct a string from the characters starting at the first pointer and ending one before the second pointer.
If first == last, the result is an empty string. If first > last, behavior is undefined (thanks Mooing Duck)