C++ formal requirement of behaviour of iterators over a container

C++ formal requirement of behaviour of iterators over a container - c++

I'm sure that I'm not alone in expecting that I could add several elements in some order to a vector or list, and then could use an iterator to retrieve those elements in the same order. For example, in:
#include <vector>
#include <cassert>
int main(int argc, char **argv)
{
using namespace std;
vector<int> v;
v.push_back(4);
v.push_back(10);
v.push_back(100);
auto i = v.begin();
assert(*i++ == 4);
assert(*i++ == 10);
assert(*i++ == 100);
return 0;
}
... all assertions should pass and the program should terminate normally (assuming that no std::bad_alloc exception is thrown during construction of the vector or adding the elements to it).
However, I'm having trouble reconciling this with any requirement in the C++ standard (I'm looking at C++11, but would like answers for other standards also if they are markedly different).
The requirement for begin() is just (23.2.1 para 6):
begin() returns an iterator referring to the first element in the container.
What I'm looking for is the requirement, or combination of requirements that in turn logically requires, that if i = v.begin(), then ++i shall refer to the second element in the vector (assuming that such an element exists) - or indeed, even the requirement that successive increments of an iterator will return each of the elements in the vector.
Edit:
A more general question is, what (if any) text in the standard requires that successfully incrementing an iterator obtained by calling begin() on a sequence (ordered or unordered) actually visits every element of the sequence?

There's isn't in the standard something straightforward to state that
if i = v.begin(), then ++i shall refer to the second element in the
vector.
However, for vector's iterators why can imply it from the following wording in the draft standard N4527 24.2.1/p5 In general [iterator.requirements.general]:
Iterators that further satisfy the requirement that, for integral
values n and dereferenceable iterator values a and (a + n), *(a + n) is equivalent to *(addressof(*a) + n), are called contiguous
iterators.
Now, std::vector's iterator satisfy this requirement, consequently we can imply that ++i is equivalent to i + 1 and thus to addressof(*i) + 1. Which indeed is the second element in the vector due to its contiguous nature.
Edit:
There was indeed a turbidness on the matter about random access iterators and contiguous storage containers in C++11 and C++14 standards. Thus, the commity decided to refine them by putting an extra group of iterators named contiguous iterators. You can find more info in the relative proposal N3884.

It looks to me like we need to put two separate parts of the standard together to get a solid requirement here. We can start with table 101, which requires that a[n] be equivalent to *(a.begin() + n) for sequence containers (specifically, basic_string, array, deqeue and vector) (and the same requirement for a.at(n), for the same containers).
Then we look at table 111 in [random.access.iterators], where it requires that the expression r += n be equivalent to:
{
difference_type m = n;
if (m >= 0)
while (m--)
++r;
else
while (m++)
--r;
return r;
}
[indentation added]
Between the two, these imply that for any n, *(begin() + n) refers to the nth item in the vector. Just in case you want to cover the last base I see open, let's cover the requirement that push_back actually append to the collection. That's also in table 101: a.push_back(t) "Appends a copy of t" (again for basic_string, string, deque, list, and vector).

[C++14: 23.2.3/1]: A sequence container organizes a finite set of objects, all of the same type, into a strictly linear arrangement. [..]
I don't know how else you'd interpret it.

The specification isn't just in the iterators. It is also in the specification of the containers, and the operations that modify those containers.
The thing is, you are not going to find a single clause that says "incrementing begin() repeatedly will access all elements of a vector in order". You need to look at the specification of every operation on every container (since these define an order of elements in the container) and the specification of iterators (and operations on them) which is essentially that "incrementing moves to the next element in the order that operations on the container defined, until we pass the end". It is the combination of numerous clauses in the standard that give the end effect.
The general concepts, however, are ....
All containers maintain some range of zero or more elements. That range has three key properties: a beginning (corresponding to the first element in an order that is meaningful to the container), and an end (corresponding to the last element), and an order (which determines the sequence in which elements will be retrieved one after the other - i.e. defines the meaning of "next").
An iterator is an object that either references an element in a range, or has a "past the end" value. An iterator that references an element in the range other than the end (last), when incremented, will reference the next element. An iterator that references the end (last) element in the range, when incremented, will be an end (past the end) iterator.
The begin() method returns an iterator that references (or points to) the first in the range (or an end iterator if the range has zero elements). The end() method returns an end iterator - one that corresponds to "one past the the end of the range". That means, if an iterator is initialised using the begin(), incrementing it repeatedly will move sequentially through the range until the end iterator is reached.
Then it is necessary to look at the specification for the various modifiers of the container - the member functions that add or remove elements. For example, push_back() is specified as adding an element to the end of the existing range for that container. It extends the range by adding an element to the end.
It is that combination of specifications - of iterators and of operations that modify containers - that guarantees the order. The net effect is that, if elements are added to a container in some order, then a iterator initialised using begin() will - when incremented repeatedly - reference the elements in the order in which they were placed in the container.
Obviously, some container modifiers are a bit more complicated - for example, std::vector's insert() is given an iterator, and adds elements there, shuffling subsequent elements to make room. However, the key point is that the modifiers place elements into the container in a defined order (or remove, in the case of operations like std::vector::erase()) and iterators will access elements in that defined order.

Related

Is using erase(it) in a loop always safe for all containers and platforms?

i want to remove elements within a container(for now it is unordered_set) by certain condition
for (auto it = windows.begin(); it != windows.end(); ) {
if ((*it)->closed() == 0)
it = numbers.erase(it);
else
++it;
}
i know the erase(it) will return the position immediately following the last of the elements erased. but
Is it mandatory by the standard there won't cause the rearrangement for the iteation when invoking erase? Is it always safe for all containers and all platforms? Say there may be some magic implementation for certain type of container within certain platform.

The C++ standard requires that unordered_set::erase preserve the order of remaining elements, and return an iterator immediately following those being erased. Therefore, the loop you show is well-defined.
[unord.req]/14 ... The erase members shall invalidate only iterators and references to the erased elements, and preserve the relative order of the elements that are not erased.
[unord.req]/11, Table 91 a.erase(q) Erases the element pointed to by q. Returns the iterator immediately following q prior to the erasure.

What is the advantage of using (it != vector.end()) instead of (it < vector.end()) in for loops? [duplicate]

I'm used to writing loops like this:
for (std::size_t index = 0; index < foo.size(); index++)
{
// Do stuff with foo[index].
}
But when I see iterator loops in others' code, they look like this:
for (Foo::Iterator iterator = foo.begin(); iterator != foo.end(); iterator++)
{
// Do stuff with *Iterator.
}
I find the iterator != foo.end() to be offputting. It can also be dangerous if iterator is incremented by more than one.
It seems more "correct" to use iterator < foo.end(), but I never see that in real code. Why not?

All iterators are equality comparable. Only random access iterators are relationally comparable. Input iterators, forward iterators, and bidirectional iterators are not relationally comparable.
Thus, the comparison using != is more generic and flexible than the comparison using <.
There are different categories of iterators because not all ranges of elements have the same access properties. For example,
if you have an iterators into an array (a contiguous sequence of elements), it's trivial to relationally compare them; you just have to compare the indices of the pointed to elements (or the pointers to them, since the iterators likely just contain pointers to the elements);
if you have iterators into a linked list and you want to test whether one iterator is "less than" another iterator, you have to walk the nodes of the linked list from the one iterator until either you reach the other iterator or you reach the end of the list.
The rule is that all operations on an iterator should have constant time complexity (or, at a minimum, sublinear time complexity). You can always perform an equality comparison in constant time since you just have to compare whether the iterators point to the same object. So, all iterators are equality comparable.
Further, you aren't allowed to increment an iterator past the end of the range into which it points. So, if you end up in a scenario where it != foo.end() does not do the same thing as it < foo.end(), you already have undefined behavior because you've iterated past the end of the range.
The same is true for pointers into an array: you aren't allowed to increment a pointer beyond one-past-the-end of the array; a program that does so exhibits undefined behavior. (The same is obviously not true for indices, since indices are just integers.)
Some Standard Library implementations (like the Visual C++ Standard Library implementation) have helpful debug code that will raise an assertion when you do something illegal with an iterator like this.

Short answer: Because Iterator is not a number, it's an object.
Longer answer: There are more collections than linear arrays. Trees and hashes, for example, don't really lend themselves to "this index is before this other index". For a tree, two indices that live on separate branches, for example. Or, any two indices in a hash -- they have no order at all, so any order you impose on them is arbitrary.
You don't have to worry about "missing" End(). It is also not a number, it is an object that represents the end of the collection. It doesn't make sense to have an iterator that goes past it, and indeed it cannot.

Guarantees of reordering of a vector

Say I have this code:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> vec {10, 15, 20};
auto itr = vec.begin();
vec.erase(itr);
for(const auto& element : vec)
{
std::cout << element << " ";
}
return 0;
}
This gives me 15 20 as expected. Now, cppreference says this about erase():
Invalidates iterators and references at or after the point of the
erase, including the end() iterator
Fair enough, but is that the only guarantee the standard gives about vector::erase()?
Is a vector allowed to reorder it's element after the erased iterator?
For example, are these conditions guaranteed to hold after the erase which would mean all elements after the erase() iterator shifted 1 to the left:
vec[0] == 15
vec[1] == 20
Or are implementations allowed to move values around as they see fit, and thus, create scenarios where vec[0] == 20 etc?
I would like a quote of the relevant part of the standard.

Let's start at the beginning:
23.2.3 Sequence containers
A sequence container organizes a finite set of objects, all of the same type, into a strictly linear arrangement.
The library provides four basic kinds of sequence containers: vector,
forward_list, list, and deque.
Emphasis on "a strictly linear arrangement". This is unambiguous.
This definition is followed by a table called "sequence container requirements", which describes erase() thusly:
a.erase(q) [ ... ]
Effects: Erases the element pointed to by q
Combined, this leaves no wiggle room for interpretation. The elements in a vector are always in "a strict linear arrangement", so when one of them is erase()d, there's only one possible outcome.

Technically, no, the standard doesn't write out a promise that it won't re-order elements when you least expect.
Practically, obviously it's not going to do that. That would be ridiculous.
Legally, you can probably take the "Effects" clause:
Erases the element pointed to by q
as having no other effects unless stated elsewhere (e.g. iterator invalidation, which follows from the erasure effect).

The two statements I found that I think guarantee it would be:
C++11 Standard
23.2.1
11 Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container member function or passing a container as an argument to a library function shall not invalidate iterators to, or change the values of, objects within that container.
If you can't "change the values of" then you can't arbitrarily re-order elements (like swapping the end value with the erased one).
23.2.3
12 The iterator returned from a.erase(q) points to the element immediately following q prior to the element being erased. If no such element exists, a.end() is returned.
This implies that the conceptual erasure of an element is implemented by closing the physical gap from the right. Given the previous rule, conceptually closing the gap, can not be seen as conceptually changing their values. This means the only implementation would be to shift the values in order.
By means of explanation.
The Standard is dealing with the abstract concept not the actual implementation, although its statements do impact the implementation.
Conceptually erasing an element simply removes it and nothing more. So given the sequence:
3 5 7 4 2 9 (6 values)
If we erase the 3rd element what does that conceptually give us?
3 5 4 2 9 (5 values)
This must be true because of the first statement above:
23.2.1
11 Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container member function or passing a container as an argument to a library function shall not invalidate iterators to, or change the values of, objects within that container.
If the implementation reordered the elements, say by swapping the erased element with the end element that rule would be broken because we would end up with this:
3 5 9 4 2
Conceptually the resulting value to the right of the erased element has changed from a 4 to a 9, thus breaking the rule.

container.erase(first,last) where first == last in STL containers

Is there a defined behavior for container.erase(first,last) when first == last in the STL, or is it undefined?
Example:
std::vector<int> v(1,1);
v.erase(v.begin(),v.begin());
std::cout << v.size(); // 1 or 0?
If there is a Standard Library specification document that has this information I would appreciate a reference to it.

The behavior is well defined.
It is a No-op(No-Operation). It does not perform any erase operation on the container as end is same as begin.
The relevant Quote from the Standard are as follows:
C++03 Standard: 24.1 Iterator requirements and
C++11 Standard: 24.2.1 Iterator requirements
Para 6 & 7 for both:
An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j. If j is reachable from i, they refer to the same container.
Most of the library’s algorithmic templates that operate on data structures have interfaces that use ranges.A range is a pair of iterators that designate the beginning and end of the computation. A range [i, i) is an empty range; in general, a range [i, j) refers to the elements in the data structure starting with the one pointed to by i and up to but not including the one pointed to by j. Range [i, j) is valid if and only if j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.

That would erase nothing at all, just like other algorithms that operate on [, ) ranges.
Even if the container is empty I think that would still work because begin() == end().

Conceptually, there is an ordinary loop from begin to end, with a simple loop condition that checks if the iterator is end already, like this:
void erase (iterator from, iterator to) {
...
while (from != to) erase (from++);
...
}
(however, implementations may vary). As you see, if from==to, then there is no single iteration of the loop body.

It is perfectly defined. It removes all elements from first to last, including first and excluding last. If there are no elements in this range (when first == last), then how much are removed? You guessed it, none.
Though I'm not sure what happens if first comes after last, I suppose this will invoke undefined behaviour.

Why is "!=" used with iterators instead of "<"?

I'm used to writing loops like this:
for (std::size_t index = 0; index < foo.size(); index++)
{
// Do stuff with foo[index].
}
But when I see iterator loops in others' code, they look like this:
for (Foo::Iterator iterator = foo.begin(); iterator != foo.end(); iterator++)
{
// Do stuff with *Iterator.
}
I find the iterator != foo.end() to be offputting. It can also be dangerous if iterator is incremented by more than one.
It seems more "correct" to use iterator < foo.end(), but I never see that in real code. Why not?

All iterators are equality comparable. Only random access iterators are relationally comparable. Input iterators, forward iterators, and bidirectional iterators are not relationally comparable.
Thus, the comparison using != is more generic and flexible than the comparison using <.
There are different categories of iterators because not all ranges of elements have the same access properties. For example,
if you have an iterators into an array (a contiguous sequence of elements), it's trivial to relationally compare them; you just have to compare the indices of the pointed to elements (or the pointers to them, since the iterators likely just contain pointers to the elements);
if you have iterators into a linked list and you want to test whether one iterator is "less than" another iterator, you have to walk the nodes of the linked list from the one iterator until either you reach the other iterator or you reach the end of the list.
The rule is that all operations on an iterator should have constant time complexity (or, at a minimum, sublinear time complexity). You can always perform an equality comparison in constant time since you just have to compare whether the iterators point to the same object. So, all iterators are equality comparable.
Further, you aren't allowed to increment an iterator past the end of the range into which it points. So, if you end up in a scenario where it != foo.end() does not do the same thing as it < foo.end(), you already have undefined behavior because you've iterated past the end of the range.
The same is true for pointers into an array: you aren't allowed to increment a pointer beyond one-past-the-end of the array; a program that does so exhibits undefined behavior. (The same is obviously not true for indices, since indices are just integers.)
Some Standard Library implementations (like the Visual C++ Standard Library implementation) have helpful debug code that will raise an assertion when you do something illegal with an iterator like this.

Short answer: Because Iterator is not a number, it's an object.
Longer answer: There are more collections than linear arrays. Trees and hashes, for example, don't really lend themselves to "this index is before this other index". For a tree, two indices that live on separate branches, for example. Or, any two indices in a hash -- they have no order at all, so any order you impose on them is arbitrary.
You don't have to worry about "missing" End(). It is also not a number, it is an object that represents the end of the collection. It doesn't make sense to have an iterator that goes past it, and indeed it cannot.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ formal requirement of behaviour of iterators over a container - c++

[C++14: 23.2.3/1]: A sequence container organizes a finite set of objects, all of the same type, into a strictly linear arrangement. [..] I don't know how else you'd interpret it.

Related

Is using erase(it) in a loop always safe for all containers and platforms?

What is the advantage of using (it != vector.end()) instead of (it < vector.end()) in for loops? [duplicate]

Guarantees of reordering of a vector

container.erase(first,last) where first == last in STL containers

Why is "!=" used with iterators instead of "<"?

Categories

Resources