What is std::string(itr, itr) supposed to do? - c++

The web site cplusplus documentation for std::string constructor taking two input iterators states in part:
Copies the sequence of characters in the range [first,last), in the same order.
first, last:
Input iterators to the initial and final positions in a range. The range used is [first,last), which includes all the characters between first and last, including the character pointed by first but not the character pointed by last.
What does this mean in the degenerate case where first == last? On the one hand first is included and on the other last is excluded? What does the official C++ standard say should happen in this case? Should an exception be thrown?

I don't know what documentation it is you're reading, but the standard says (§21.4.2/15):
[..] constructs a string from the values in the range [begin, end), as indicated in the Sequence Requirements table
And the Sequence requirements table (Table 100) defines X a(i, j) for a valid range [i, j) as:
Constructs a sequence container equal to the range [i, j)
A range is valid when the second iterator is reachable from the first (through incrementing). For two iterators that are equal, the range is empty. See §24.2.1/7:
A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is an empty range; in general, a range [i,j) refers to the elements in the data structure starting with the element pointed to by i and up to but not including the element pointed to by j. Range [i,j) is valid if and only if j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
So if first == last, as you say, you will get an empty string. If last is not reachable from first, you have undefined behaviour.

The range is empty, so there's nothing to copy. The result is an empty string.

What does this mean in the degenerate case where first == last?
It means that the input range is empty, so the string will be empty.
What does the standard say should happen in this case?
C++11 24.2.1/7 says:
A range [i,i) is an empty range

One thing I've used it for several times is constructing a std::string from a substring in a large C-style string (const char*). You can pass it two const char* pointers and it will construct a string from the characters starting at the first pointer and ending one before the second pointer.
If first == last, the result is an empty string. If first > last, behavior is undefined (thanks Mooing Duck)

Related

Can partial_sort's middle iterator equal the end iterator?

According to http://www.cplusplus.com/reference/algorithm/partial_sort/, the middle argument is a:
Random-access iterator pointing to the element within the range
[first,last) that is used as the upper boundary of the elements that
are fully sorted.
Specifically, middle isn't allowed to be equal to last. However, https://en.cppreference.com/w/cpp/algorithm/partial_sort seems to have entirely different documentation and doesn't mention any range restrictions (but obviously middle shouldn't be outside [first, last]).
Is behavior defined when middle == last?
Can partial_sort's middle iterator equal the end iterator?
Yes, it can. In that case the effect is same as using std::sort.
The standard specifies the preconditions like this:
Preconditions: [first, middle) and [middle, last) are valid ranges.
The description of std::partial_sort in the standard (N4659, 28.7.1.3/2) reads
Effects: Places the first middle - first sorted elements from the range [first, last) into the range [first, middle). The rest of the elements in the range [middle, last) are placed in an unspecified order.
I see nothing here that would prohibit middle from equaling last.

Meaning of terms identical, equal, equivalent in the Standard

There are at least three terms with similar meaning in the Standard: identical, equal and equivalent. All these used when algorithms described. Say, std::adjacent_find:
Searches the range [first, last) for two consecutive identical elements.
But description of the comparator says:
binary predicate which returns ​true if the elements should be treated as equal
When it comes to associative containers, the word equivalent is used. For two elements a and b it means (roughly) !(a < b) && !(b < a). While equal means a == b.
What does the term identical mean? Is it defined in the Standard?
There is no definition of "identical" that I could find in the relevant sections of the standard. Looks like a colloquial use of the word. Which is further supported by the fact your quote is from cppreference. The normative definition of adjacent_find in the standard is specified in terms of ==
(or a predicate) directly:
Returns: The first iterator i such that both i and i + 1 are in the range [first, last) for which the following corresponding conditions
hold: *i == *(i + 1), pred(*i, *(i + 1)) != false. Returns last if no
such iterator is found.
While cppreference is an invaluable resource, its goal is to digest the standard text into easily accessible and understandable materiel. Sometimes, it will make colloquial use of words for an intuitive explanation. This is one such case.

Is string subscript an associated index?

The subscript operator ([]) takes a std::string::size_type value.
The operator returns a reference to the character at the given
position. The value in the subscript is referred to as "a subscript" pp93 ~ 94 C++ Primer 5ed.
and
A vector is a collection of objects, all of which have the same type. Evey object in the collection has an associated index, which gives
access to that object.pp96 C++ Primer 5ed.
Question:
Is string subscript an associated index? If not, what is the difference between the subscript of the std::string type and the associated index of the collection/vector?
Think "index" as "the sequential number of an item," not "index" as "the lookup table in a book."
What they're saying about vectors is that the elements in them can be accessed through sequential numeric indices: v[0], v[1], etc.
The exact same holds for strings and the characters in them.
According to std::vector::operator[], the function:
Returns a reference to the element at specified location pos. No bounds checking is performed.
According to std::basic_string::operator[], the function:
Returns a reference to the character at specified location pos. No bounds checking is performed. If pos > size(), the behavior is undefined.
Thus, they are pretty much the same thing. The term associated index means exactly what it sounds like; It is the index associated with the element, nothing more.
The wording here is rather precise, but there is no real difference for these two simple cases. For both string and vector, X[0] denotes the first element of X. That is to say, 0 is the associated index of the first element of X, and 0 is also the argument to operator[], aka the subscript.
To see an example that is not so simple, consider std::string_view. You can have a string_view of the 100th to 200th character of a string. Now view[5] has subscript 5, but it refers to the 105th character in the underlying string.

Alternatives or enhancements to std::find() on a native c++ pointer (such as uint16 *)

I am trying to come up with a single line C++ conditional check to check the presence of a value within a buffer. (like the if 'value' in list: check in python)
std::find seemed to be the right fit here.
char *buf = "01020304";
int buf_len = 8
uint16_t *ptr = std::find((uint16_t *) buf, (uint16_t *) buf + 3, (uint16_t)13360); // 13360 corresponds to 2 bytes in "04"
std::cout <<"\n(ptr-buf):"<<(ptr-(uint16_t *) buf);//returns 3 for any value, even if 0 is passed instead of 13360 since the last 2 bytes are not searched (buf+6, buf +7))
std::find seems to search in the [first, last) range of a buffer. This makes sense for STL types like vector. However, for raw pointers, the only way to make std::find() look at the last element of buf involves passing an offset beyond buf (eg. buf+8).
Is this a safe thing to do? I am guessing not. Is there a better alternative to using std::find to do a single-line exists check on a char *buffer?
When using pointers with STL algorithms, it is valid and correct to use a 'one past the end' pointer as the end iterator for a range. It is not valid to dereference a pointer one past the end of an array but you can test against one and perform certain arithmetic operations with one.
The standard specifies that it is valid to perform comparisons on a 'one past the end' pointer, section 5.9.3, 'Relational Operators':
5.9.3 Comparing pointers to objects is defined as follows:
(3.1) — If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
(3.2) — If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
It also specifies addition and subtraction on pointers such that 'one past the end' pointers behave correctly in algorithms like std::distance(), std::advance(), etc.
In addition the standard has this to say regarding STL iterators:
24.1.6 Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. ...
Past the end pointers are standard C too. E.g. it's safe to produce them with arithmetic: from C11 standard, 6.5.6 (additive operators) paragraph 8:
... If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.

What is multi-pass guarantee per C++ ISO standard?

Reading Working Draft N3337-1, Standard for Programming Language C++, 24.2.5 Forward iterators, page 806.
From draft:
Two dereferenceable iterators a and b of type X offer the multi-pass guarantee if:
— a == b implies ++a == ++b and
— X is a pointer type or the expression (void)++X(a), *a is equivalent to the expression *a.
[ Note: The requirement that a == b implies ++a == ++b (which is not true for input and output iterators) and the removal of the restrictions on the number of the assignments through a mutable iterator (which applies to output iterators) allows the use of multi-pass one-directional algorithms with forward iterators.
—end note ]
Could someone re-interpret this in easier terms ? I understand that Forward iterators are multi-pass, but I don't understand how this is accomplished per C++ standard requirements.
The terms states it all, I'd think: you can pass through the sequence multiple times and remember positions within the sequence. As long as the sequence doesn't change, starting at a specific position (iterator) you'll traverse over the same objects as often as you want in the same order. However, you can only go forward, there is no way to move backwards. The canonical example of a sequence like this is a singly-linked list.
The quoted clause basically says, that if you have two iterators comparing equal and you increment each one of them, you get to the same position and they compare equal again:
if (it1 == it2) {
++it1;
++it2;
assert(it1 == it2); // has to hold for multi-pass sequences
}
The somewhat weird expression ++X(a), *a is basically intended to advance an iterator independent to a and the requirement that ++X(a), *a being equivalent to *a basically means that iterator over the sequence using an independent iterator doesn't change what a refers to. This is unlike input iterator where ++InIt(a), *a is not necessarily equivalent to *a as the first expression can have change the position, possibly invalidating a and/or change the value it is referring to.
By contrast, the single-pass sequence (input and output iterations in standard terms) can only be traversed once: trying to traverse the sequence multiple times will not work necessarily work. The canonical example of sequences like this are input from the keyboard and output to the console: once read, you can't get back the same characters again and once sent you can't undo the characters.