What does"one-past-the-last-element" mean in vectors? - c++

I'm learning vectors and am confused on how the array is copying to thevector here
double p[] = {1, 2, 3, 4, 5};
std::vector<double> a(p, p+5);
I also know std::vector<double> a(3,5); means `make room for 3 and initialize them with 5. How does the above code work?
The second point is I read the paragraph from where I copied the above code.
Understanding the second point is crucial when working with vectors or
any other standard containers. The controlled sequence is always
expressed in terms of [first, one-past-last)—not only for ctors, but
also for every function that operates on a range of elements.
I don't know what is the meant by [first, one-past-last) ?
I know mathematically but don't know why/how vector copy the array in this way?
Edited
another related question
The member function end() returns an iterator that "points" to
one-past-the-last-element in the sequence. Note that dereferencing the
iterator returned by end() is illegal and has undefined results.
Can you explain this one-past-the-last-element what is it? and why?

Never dereference end() or rend() from STL containers, as they do not point to valid elements.
This picture below can help you visualize this.
The advantage of an half open range is:
1. Handling of empty ranges occur when begin() == end()
2. Iterating over the elements can be intuitively done by checking until the iterator equals end().

Strongly coupled with containers (e.g. vector, list, map) is the concept of iterators. An iterator is a C++ abstraction of a pointer. I.e. an iterator points to an object inside the container (or to one past the last element), and dereferencing the iterator means accessing that element.
Lets take for instance a vector of 4 elements:
| 0 | 1 | 2 | 3 | |
^ ^ ^
| | |
| | one past the end (outside of the container elements)
| last element
first element
The (algorithms in the) standard template library operate on ranges, rather than on containers. This way you can apply operations, not only to the entire container, but also to ranges (consecutive elements of the container).
A range is specified by [first, last) (inclusive first, exclusive last). That's why you need an iterator to one past the end: to specify a range equal to the entire contents of the container. But as that iterator points outside of it, it is illegal to dereference it.

The constructor of std::vector has several overloads.
For std::vector<double> a(3,5); the fill constructor is used :
explicit vector (size_type n);
vector (size_type n, const value_type& val,
const allocator_type& alloc = allocator_type());
This takes a size parameter as it's first parameter and an optional and third parameter, the second parameter specifies the value you want to give the newly created objects.
double p[] = {1, 2, 3, 4, 5};
std::vector<double> a(p, p+5);
Uses another overload of the constructor, namely the range constructor:
template <class InputIterator>
vector (InputIterator first, InputIterator last,
const allocator_type& alloc = allocator_type());
This takes an iterator to the start of a collection and the end() iterator and traverses and adds to the vector until first == last.
The reason why end() is implemented as one-past-the-last-element is because this allows for implementations to check for equality like:
while(first != last)
{
//savely add value of first to vector
++first;
}

Iterators are an abstraction of pointers.
A half-open interval [a,b) is defined as all elements x>=a and x<b. The advantage of it is that [a,a) is well defined and empty for any a.
Anything that can be incremented and compared equal can define a half open interval. So [ptr1,ptr2) is the element ptr1 then ptr1+1 then ptr1+2 until you reach ptr2, but not including ptr2.
For iterators, it is similar -- except we do not always have random access. So we talk about next instead of +1.
Pointers still count as a kind-of iterator.
A range of iterators or pointers "talks about" the elements pointed to. So when a vector takes a pair of iterators (first, and one-past-the-end), this defines a half-open interval of iterators, which also defines a collection of values they point to.
You can construct the vector from such a half-open range. It copies the elements poimtrd to into the vector.

Related

What does std::find return if element not found

As per documentation, std::find returns
last
if no element is found. What does that mean? Does it return an iterator pointing to the last element in the container? Or does it return an iterator pointing to .end(), i.e. pointing outside the container?
The following code prints 0, which is not an element of the container. So, I guess std::find returns an iterator outside the container. Could you please confirm?
int main()
{
vector<int> vec = {1, 2,3, 1000, 4, 5};
auto itr = std::find(vec.begin(), vec.end(), 456);
cout << *itr;
}
last is the name of second parameter to find. It doesn't know what kind of container you're using, just the iterators that you give it.
In your example, last is vec.end(), which is (by definition) not dereferenceable, since it's one past the last element. So by dereferencing it, you invoke undefined behaviour, which in this case manifests as printing out 0.
Algorithms apply to ranges, which are defined by a pair of iterators. Those iterators are passed as arguments to the algorithm. The first iterator points at the first element in the range, and the second argument points at one past the end of the range. Algorithms that can fail return a copy of the past-the-end iterator when they fail. That's what std::find does: if there is no matching element it returns its second argument.
Note that the preceding paragraph does not use the word "container". Containers have member functions that give you a range that you can use to get at the elements of the container, but there are also ways of creating iterators that have no connection to any container.
Based on this documentation, it literally says:
"Return value:
Iterator to the first element satisfying the condition or last if no such element is found."
In your case, it's out the vector by one, .end()

C++ formal requirement of behaviour of iterators over a container

I'm sure that I'm not alone in expecting that I could add several elements in some order to a vector or list, and then could use an iterator to retrieve those elements in the same order. For example, in:
#include <vector>
#include <cassert>
int main(int argc, char **argv)
{
using namespace std;
vector<int> v;
v.push_back(4);
v.push_back(10);
v.push_back(100);
auto i = v.begin();
assert(*i++ == 4);
assert(*i++ == 10);
assert(*i++ == 100);
return 0;
}
... all assertions should pass and the program should terminate normally (assuming that no std::bad_alloc exception is thrown during construction of the vector or adding the elements to it).
However, I'm having trouble reconciling this with any requirement in the C++ standard (I'm looking at C++11, but would like answers for other standards also if they are markedly different).
The requirement for begin() is just (23.2.1 para 6):
begin() returns an iterator referring to the first element in the container.
What I'm looking for is the requirement, or combination of requirements that in turn logically requires, that if i = v.begin(), then ++i shall refer to the second element in the vector (assuming that such an element exists) - or indeed, even the requirement that successive increments of an iterator will return each of the elements in the vector.
Edit:
A more general question is, what (if any) text in the standard requires that successfully incrementing an iterator obtained by calling begin() on a sequence (ordered or unordered) actually visits every element of the sequence?
There's isn't in the standard something straightforward to state that
if i = v.begin(), then ++i shall refer to the second element in the
vector.
However, for vector's iterators why can imply it from the following wording in the draft standard N4527 24.2.1/p5 In general [iterator.requirements.general]:
Iterators that further satisfy the requirement that, for integral
values n and dereferenceable iterator values a and (a + n), *(a + n) is equivalent to *(addressof(*a) + n), are called contiguous
iterators.
Now, std::vector's iterator satisfy this requirement, consequently we can imply that ++i is equivalent to i + 1 and thus to addressof(*i) + 1. Which indeed is the second element in the vector due to its contiguous nature.
Edit:
There was indeed a turbidness on the matter about random access iterators and contiguous storage containers in C++11 and C++14 standards. Thus, the commity decided to refine them by putting an extra group of iterators named contiguous iterators. You can find more info in the relative proposal N3884.
It looks to me like we need to put two separate parts of the standard together to get a solid requirement here. We can start with table 101, which requires that a[n] be equivalent to *(a.begin() + n) for sequence containers (specifically, basic_string, array, deqeue and vector) (and the same requirement for a.at(n), for the same containers).
Then we look at table 111 in [random.access.iterators], where it requires that the expression r += n be equivalent to:
{
difference_type m = n;
if (m >= 0)
while (m--)
++r;
else
while (m++)
--r;
return r;
}
[indentation added]
Between the two, these imply that for any n, *(begin() + n) refers to the nth item in the vector. Just in case you want to cover the last base I see open, let's cover the requirement that push_back actually append to the collection. That's also in table 101: a.push_back(t) "Appends a copy of t" (again for basic_string, string, deque, list, and vector).
[C++14: 23.2.3/1]: A sequence container organizes a finite set of objects, all of the same type, into a strictly linear arrangement. [..]
I don't know how else you'd interpret it.
The specification isn't just in the iterators. It is also in the specification of the containers, and the operations that modify those containers.
The thing is, you are not going to find a single clause that says "incrementing begin() repeatedly will access all elements of a vector in order". You need to look at the specification of every operation on every container (since these define an order of elements in the container) and the specification of iterators (and operations on them) which is essentially that "incrementing moves to the next element in the order that operations on the container defined, until we pass the end". It is the combination of numerous clauses in the standard that give the end effect.
The general concepts, however, are ....
All containers maintain some range of zero or more elements. That range has three key properties: a beginning (corresponding to the first element in an order that is meaningful to the container), and an end (corresponding to the last element), and an order (which determines the sequence in which elements will be retrieved one after the other - i.e. defines the meaning of "next").
An iterator is an object that either references an element in a range, or has a "past the end" value. An iterator that references an element in the range other than the end (last), when incremented, will reference the next element. An iterator that references the end (last) element in the range, when incremented, will be an end (past the end) iterator.
The begin() method returns an iterator that references (or points to) the first in the range (or an end iterator if the range has zero elements). The end() method returns an end iterator - one that corresponds to "one past the the end of the range". That means, if an iterator is initialised using the begin(), incrementing it repeatedly will move sequentially through the range until the end iterator is reached.
Then it is necessary to look at the specification for the various modifiers of the container - the member functions that add or remove elements. For example, push_back() is specified as adding an element to the end of the existing range for that container. It extends the range by adding an element to the end.
It is that combination of specifications - of iterators and of operations that modify containers - that guarantees the order. The net effect is that, if elements are added to a container in some order, then a iterator initialised using begin() will - when incremented repeatedly - reference the elements in the order in which they were placed in the container.
Obviously, some container modifiers are a bit more complicated - for example, std::vector's insert() is given an iterator, and adds elements there, shuffling subsequent elements to make room. However, the key point is that the modifiers place elements into the container in a defined order (or remove, in the case of operations like std::vector::erase()) and iterators will access elements in that defined order.

Calling std::sort

sort in the C++ standard library is called as:
sort (first element, last element);
So if I have an array:
int a[n];
I should call sort as:
sort(&a[0], &a[n-1]);
since a[0] is the first element and a[n-1] the last. When I do so, however, it doesn't sort the last element. To get a fully sorted array, I must use:
sort(&a[0], &a[n]);
Why is this?
Because ranges in stl are always defined as half-open ranges from the fist element iterator to to the "one-past-the-end"-iterator. With C++11 you can use:
int a[n];
sort(std::begin(a),std::end(a));
Format for sort in STL in c++ is,
sort (first element, last element);
No, it's not. You are supposed to provide an iterator for the first element, and a one-past-the-end iterator, as you've discovered.
The Standard Library in general uses semi-open intervals to describe ranges through iterators. Otherwise it would be impossible for empty ranges to be expressed:
// An empty container!
std::vector<int> v;
// Pretend that `v.end()` returns an iterator for the actual last element,
// with the same caveat as `v.begin()` that the case where no elements
// exist gives you some kind of "sentinel" iterator that does not represent
// any element at all and cannot be dereferenced
std::vector<int>::iterator a = v.begin(), b = v.end();
// Oh no, this would think that there's one element!
std::sort(a, b);

vector::erase and reverse_iterator

I have a collection of elements in a std::vector that are sorted in a descending order starting from the first element. I have to use a vector because I need to have the elements in a contiguous chunk of memory. And I have a collection holding many instances of vectors with the described characteristics (always sorted in a descending order).
Now, sometimes, when I find out that I have too many elements in the greater collection (the one that holds these vectors), I discard the smallest elements from these vectors some way similar to this pseudo-code:
grand_collection: collection that holds these vectors
T: type argument of my vector
C: the type that is a member of T, that participates in the < comparison (this is what sorts data before they hit any of the vectors).
std::map<C, std::pair<T::const_reverse_iterator, std::vector<T>&>> what_to_delete;
iterate(it = grand_collection.begin() -> grand_collection.end())
{
iterate(vect_rit = it->rbegin() -> it->rend())
{
// ...
what_to_delete <- (vect_rit->C, pair(vect_rit, *it))
if (what_to_delete.size() > threshold)
what_to_delete.erase(what_to_delete.begin());
// ...
}
}
Now, after running this code, in what_to_delete I have a collection of iterators pointing to the original vectors that I want to remove from these vectors (overall smallest values). Remember, the original vectors are sorted before they hit this code, which means that for any what_to_delete[0 - n] there is no way that an iterator on position n - m would point to an element further from the beginning of the same vector than n, where m > 0.
When erasing elements from the original vectors, I have to convert a reverse_iterator to iterator. To do this, I rely on C++11's §24.4.1/1:
The relationship between reverse_iterator and iterator is
&*(reverse_iterator(i)) == &*(i- 1)
Which means that to delete a vect_rit, I use:
vector.erase(--vect_rit.base());
Now, according to C++11 standard §23.3.6.5/3:
iterator erase(const_iterator position); Effects: Invalidates
iterators and references at or after the point of the erase.
How does this work with reverse_iterators? Are reverse_iterators internally implemented with a reference to a vector's real beginning (vector[0]) and transforming that vect_rit to a classic iterator so then erasing would be safe? Or does reverse_iterator use rbegin() (which is vector[vector.size()]) as a reference point and deleting anything that is further from vector's 0-index would still invalidate my reverse iterator?
Edit:
Looks like reverse_iterator uses rbegin() as its reference point. Erasing elements the way I described was giving me errors about a non-deferenceable iterator after the first element was deleted. Whereas when storing classic iterators (converting to const_iterator) while inserting to what_to_delete worked correctly.
Now, for future reference, does The Standard specify what should be treated as a reference point in case of a random-access reverse_iterator? Or this is an implementation detail?
Thanks!
In the question you have already quoted exactly what the standard says a reverse_iterator is:
The relationship between reverse_iterator and iterator is &*(reverse_iterator(i)) == &*(i- 1)
Remember that a reverse_iterator is just an 'adaptor' on top of the underlying iterator (reverse_iterator::current). The 'reference point', as you put it, for a reverse_iterator is that wrapped iterator, current. All operations on the reverse_iterator really occur on that underlying iterator. You can obtain that iterator using the reverse_iterator::base() function.
If you erase --vect_rit.base(), you are in effect erasing --current, so current will be invalidated.
As a side note, the expression --vect_rit.base() might not always compile. If the iterator is actually just a raw pointer (as might be the case for a vector), then vect_rit.base() returns an rvalue (a prvalue in C++11 terms), so the pre-decrement operator won't work on it since that operator needs a modifiable lvalue. See "Item 28: Understand how to use a reverse_iterator's base iterator" in "Effective STL" by Scott Meyers. (an early version of the item can be found online in "Guideline 3" of http://www.drdobbs.com/three-guidelines-for-effective-iterator/184401406).
You can use the even uglier expression, (++vect_rit).base(), to avoid that problem. Or since you're dealing with a vector and random access iterators: vect_rit.base() - 1
Either way, vect_rit is invalidated by the erase because vect_rit.current is invalidated.
However, remember that vector::erase() returns a valid iterator to the new location of the element that followed the one that was just erased. You can use that to 're-synchronize' vect_rit:
vect_rit = vector_type::reverse_iterator( vector.erase(vect_rit.base() - 1));
From a standardese point of view (and I'll admit, I'm not an expert on the standard): From §24.5.1.1:
namespace std {
template <class Iterator>
class reverse_iterator ...
{
...
Iterator base() const; // explicit
...
protected:
Iterator current;
...
};
}
And from §24.5.1.3.3:
Iterator base() const; // explicit
Returns: current.
Thus it seems to me that so long as you don't erase anything in the vector before what one of your reverse_iterators points to, said reverse_iterator should remain valid.
Of course, given your description, there is one catch: if you have two contiguous elements in your vector that you end up wanting to delete, the fact that you vector.erase(--vector_rit.base()) means that you've invalidated the reverse_iterator "pointing" to the immediately preceeding element, and so your next vector.erase(...) is undefined behavior.
Just in case that's clear as mud, let me say that differently:
std::vector<T> v=...;
...
// it_1 and it_2 are contiguous
std::vector<T>::reverse_iterator it_1=v.rend();
std::vector<T>::reverse_iterator it_2=it_1;
--it_2;
// Erase everything after it_1's pointee:
// convert from reverse_iterator to iterator
std::vector<T>::iterator tmp_it=it_1.base();
// but that points one too far in, so decrement;
--tmp_it;
// of course, now tmp_it points at it_2's base:
assert(tmp_it == it_2.base());
// perform erasure
v.erase(tmp_it); // invalidates all iterators pointing at or past *tmp_it
// (like, say it_2.base()...)
// now delete it_2's pointee:
std::vector<T>::iterator tmp_it_2=it_2.base(); // note, invalid iterator!
// undefined behavior:
--tmp_it_2;
v.erase(tmp_it_2);
In practice, I suspect that you'll run into two possible implementations: more commonly, the underlying iterator will be little more than a (suitably wrapped) raw pointer, and so everything will work perfectly happily. Less commonly, the iterator might actually try to track invalidations/perform bounds checking (didn't Dinkumware STL do such things when compiled in debug mode at one point?), and just might yell at you.
The reverse_iterator, just like the normal iterator, points to a certain position in the vector. Implementation details are irrelevant, but if you must know, they both are (in a typical implementation) just plain old pointers inside. The difference is the direction. The reverse iterator has its + and - reversed w.r.t. the regular iterator (and also ++ and --, > and < etc).
This is interesting to know, but doesn't really imply an answer to the main question.
If you read the language carefully, it says:
Invalidates iterators and references at or after the point of the erase.
References do not have a built-in sense of direction. Hence, the language clearly refers to the container's own sense of direction. Positions after the point of the erase are those with higher indices. Hence, the iterator's direction is irrelevant here.

The behavior of overlapped vector::insert

Where does the C++ standard declare that the pair of iterators passed to std::vector::insert must not overlap the original sequence?
Edit: To elaborate, I'm pretty sure that the standard does not require the standard library to handle situations like this:
std::vector<int> v(10);
std::vector<int>::iterator first = v.begin() + 5;
std::vector<int>::iterator last = v.begin() + 8;
v.insert(v.begin() + 2, first, last);
However, I was unable to find anything in the standard, that would prohibit the ranges [first, last) and [v.begin(), v.end()) to overlap.
23.1.1/4 Sequence requirements has:
expression: a.insert(p,i,j)
return type: void
precondition: i,j are not iterators into a. inserts copies of elements in[i,j) before p.
So i and j cannot be iterators into your vector.
It makes sense, as during the insert operation, the vector may need to resize itself, and so the existing elements may first be copied to a new memory location (there by invalidating the current iterators).
Consider the behavior if it was allowed. Every insert into the vector would both increase the distance between the start and end iterator by one and move the start iterator up one. Therefore the start iterator would never reach the end iterator and the algorithm would execute until an out of memory exception occurred.