Is behavior of std::distance undefined when called on a pair of std::vector iterators that have been invalidated by moving the vector?
For context:
I'm writing copy and move constructors for a class that has a vector of data and a vector of iterators that point to that data. Once I move the data to its destination, I need to translate the vector of iterators to point to the new container. I would like to avoid creating intermediate index representation in memory.
Is behavior of std::distance undefined when called on a pair of std::vector iterators that have been invalidated by moving the vector?
If the iterators are valid before the move, they will remain valid after the move - so you don't need to recalculate them using std::distance.
(emphasis mine below)
std::vector::vector
After container move construction, references, pointers, and iterators (other than the end iterator) to other remain valid, but refer to elements that are now in *this.
The current standard makes this guarantee via the blanket statement in [container.requirements.general/12], and a more direct guarantee is under consideration via LWG 2321.
[container.requirements.general/12] states that
Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container member function or passing a container as an argument to a library function shall not invalidate iterators to, or change the values of, objects within that container.
The same blanket statement goes for the move assignment operator and this means that, in accordance with the standard, the iterators will stay valid after the move.
The current wording in LWG 2321 gives a hint of what a new paragraph in the standard could look like if the library working group finalize this - which seems to be hard. LWG 2321 was opened back in 2013.
no move constructor (or move assignment operator when allocator_traits<allocator_type>::propagate_on_container_move_assignment::value is true) of a container (except for array) invalidates any references, pointers, or iterators referring to the elements of the source container. [Note: The end() iterator does not refer to any element, so it may be invalidated. — end note]
If that's too vague, you can use
[container.requirements.general/11.6]
no swap() function invalidates any references, pointers, or iterators referring to the elements of the containers being swapped. [ Note: The end() iterator does not refer to any element, so it may be invalidated. — end note ]
If the iterators are valid before you swap, they are valid after the swap.
Here's an example class using the guarantee given for swap:
#include <vector>
class Foo {
std::vector<int> data{};
std::vector<decltype(data)::iterator> dits{};
public:
Foo() = default;
Foo(const Foo&) = delete; // here, dits would need to be calculated
// A move constructor guaranteed to preserve iterator validity.
Foo(Foo&& rhs) noexcept {
data.swap(rhs.data);
dits.swap(rhs.dits);
}
Foo& operator=(const Foo&) = delete;
// A move assignment operator guaranteed to preserve iterator validity.
Foo& operator=(Foo&& rhs) noexcept {
data.swap(rhs.data);
dits.swap(rhs.dits);
return *this;
}
~Foo() = default;
};
Related
I'm inserting a class that has copy constructor but does not have copy assignment into a deque, and a simple example is shown below:
class X {
public:
X(const int value): value(value) {}
const int value;
};
void insert() {
std::deque<X> queue;
X x(3);
queue.insert(queue.begin(), x); // this cannot compile
queue.emplace(queue.begin(), x); // this cannot compile
queue.emplace_front(x); // this is OK
}
The compiler complains that the class X does not have a copy assignment due to the const int value in the definition. I understand the part that it does not have a copy assignment, but it indeed has a copy constructor, and I thought to insert into a deque, the element will be copy-constructed, hence I only need a copy constructor and not a copy assignment, where am I wrong here? And how do I deal with this kind of situation where I only have copy constructor and not copy assignment due to const fields?
"I thought to insert into a deque, the element will be copy-constructed, hence I only need a copy constructor and not a copy assignment". The requirements depend on which function you use :
For std::deque<T>::emplace :
Type requirements
-T (the container's element type) must meet the requirements of MoveAssignable, MoveInsertable and EmplaceConstructible.
emplace won't work with X because it can't be move assigned.
For std::deque<T>::emplace_front :
Type requirements
-T (the container's element type) must meet the requirements of EmplaceConstructible.
emplace_front is what you were thinking of and does work with X since it is emplace constructible.
The reason the two functions have different requirements is that elements in a deque are stable under front and back insertion. When you insert an element at the front or back of the deque all pointers and references to its existing elements remain valid, nothing has to move.
With emplace that guarantee isn't provided because it can insert elements anywhere in the container. If you insert in the middle of the container, some number of elements might need to be shifted, so elements need to be assignable to allow this shift. Passing begin() as the iterator doesn't help, the function is implemented to accommodate the case where the iterator may not be begin() or end().
The standard C++ containers offer only one version of operator[] for containers like vector<T> and deque<T>. It returns a T& (other than for vector<bool>, which I'm going to ignore), which is an lvalue. That means that in code like this,
vector<BigObject> makeVector(); // factory function
auto copyOfObject = makeVector()[0]; // copy BigObject
copyOfObject will be copy constructed. Given that makeVector() returns an rvalue vector, it seems reasonable to expect copyOfObject to be move constructed.
If operator[] for such containers was overloaded for rvalue and lvalue objects, then operator[] for rvalue containers could return an rvalue reference, i.e., an rvalue:
template<typename T>
container {
public:
T& operator[](int index) &; // for lvalue objects
T&& operator[](int index) &&; // for rvalue objects
...
};
In that case, copyOfObject would be move constructed.
Is there a reason this kind of overloading would be a bad idea in general? Is there a reason why it's not done for the standard containers in C++14?
Converting comment into answer:
There's nothing inherently wrong with this approach; class member access follows a similar rule (E1.E2 is an xvalue if E1 is an rvalue and E2 names a non-static data member and is not a reference, see [expr.ref]/4.2), and elements inside a container are logically similar to non-static data members.
A significant problem with doing it for std::vector or other standard containers is that it will likely break some legacy code. Consider:
void foo(int &);
std::vector<int> bar();
foo(bar()[0]);
That last line will stop compiling if operator[] on an rvalue vector returned an xvalue. Alternatively - and arguably worse - if there is a foo(const int &) overload, it will silently start calling that function instead.
Also, returning a bunch of elements in a container and only using one element is already rather inefficient. It's arguable that code that does this probably doesn't care much about speed anyway, and so the small performance improvement is not worth introducing a potentially breaking change.
I think you will leave the container in an invalid state if you move out one of the elements, I would argue the need to allow that state at all. Second, if you ever need that, can't you just call the new object's move constructor like this:
T copyObj = std::move(makeVector()[0]);
Update:
Most important point is, again in my opinion, that containers are containers by their nature, so they should not anyhow modify the elements inside them. They just provide a storage, iteration mechanism, etc.
vector<int> v;
v.push_back(1);
v.push_back(v[0]);
If the second push_back causes a reallocation, the reference to the first integer in the vector will no longer be valid. So this isn't safe?
vector<int> v;
v.push_back(1);
v.reserve(v.size() + 1);
v.push_back(v[0]);
This makes it safe?
It looks like http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#526 addressed this problem (or something very similar to it) as a potential defect in the standard:
1) Parameters taken by const reference can be changed during execution
of the function
Examples:
Given std::vector v:
v.insert(v.begin(), v[2]);
v[2] can be changed by moving elements of vector
The proposed resolution was that this was not a defect:
vector::insert(iter, value) is required to work because the standard
doesn't give permission for it not to work.
Yes, it's safe, and standard library implementations jump through hoops to make it so.
I believe implementers trace this requirement back to 23.2/11 somehow, but I can't figure out how, and I can't find something more concrete either. The best I can find is this article:
http://www.drdobbs.com/cpp/copying-container-elements-from-the-c-li/240155771
Inspection of libc++'s and libstdc++'s implementations shows that they are also safe.
The standard guarantees even your first example to be safe. Quoting C++11
[sequence.reqmts]
3 In Tables 100 and 101 ... X denotes a sequence container class, a denotes a value of X containing elements of type T, ... t denotes an lvalue or a const rvalue of X::value_type
16 Table 101 ...
Expression a.push_back(t) Return type void Operational semantics Appends a copy of t. Requires: T shall be CopyInsertable into X. Container basic_string, deque, list, vector
So even though it's not exactly trivial, the implementation must guarantee it will not invalidate the reference when doing the push_back.
It is not obvious that the first example is safe, because the simplest implementation of push_back would be to first reallocate the vector, if needed, and then copy the reference.
But at least it seems to be safe with Visual Studio 2010. Its implementation of push_back does special handling of the case when you push back an element in the vector.
The code is structured as follows:
void push_back(const _Ty& _Val)
{ // insert element at end
if (_Inside(_STD addressof(_Val)))
{ // push back an element
...
}
else
{ // push back a non-element
...
}
}
This isn't a guarantee from the standard, but as another data point, v.push_back(v[0]) is safe for LLVM's libc++.
libc++'s std::vector::push_back calls __push_back_slow_path when it needs to reallocate memory:
void __push_back_slow_path(_Up& __x) {
allocator_type& __a = this->__alloc();
__split_buffer<value_type, allocator_type&> __v(__recommend(size() + 1),
size(),
__a);
// Note that we construct a copy of __x before deallocating
// the existing storage or moving existing elements.
__alloc_traits::construct(__a,
_VSTD::__to_raw_pointer(__v.__end_),
_VSTD::forward<_Up>(__x));
__v.__end_++;
// Moving existing elements happens here:
__swap_out_circular_buffer(__v);
// When __v goes out of scope, __x will be invalid.
}
The first version is definitely NOT safe:
Operations on iterators obtained by calling a standard library container or string member function may access the underlying container, but shall not modify it. [ Note: In particular, container operations that invalidate iterators conflict with operations on iterators associated with that container. — end note ]
from section 17.6.5.9
Note that this is the section on data races, which people normally think of in conjunction with threading... but the actual definition involves "happens before" relationships, and I don't see any ordering relationship between the multiple side-effects of push_back in play here, namely the reference invalidation seems not to be defined as ordered with respect to copy-constructing the new tail element.
It is completely safe.
In your second example you have
v.reserve(v.size() + 1);
which is not needed because if vector goes out of its size, it will imply the reserve.
Vector is responsible for this stuff, not you.
Both are safe since push_back will copy the value, not the reference. If you are storing pointers, that is still safe as far as the vector is concerned, but just know that you'll have two elements of your vector pointing to the same data.
Section 23.2.1 General Container Requirements
16
a.push_back(t) Appends a copy of t. Requires: T shall be CopyInsertable into X.
a.push_back(rv) Appends a copy of rv. Requires: T shall be MoveInsertable into X.
Implementations of push_back must therefore ensure that a copy of v[0] is inserted. By counter example, assuming an implementation that would reallocate before copying, it would not assuredly append a copy of v[0] and as such violate the specs.
From 23.3.6.5/1: Causes reallocation if the new size is greater than the old capacity. If no reallocation happens, all the iterators and references before the insertion point remain valid.
Since we're inserting at the end, no references will be invalidated if the vector isn't resized. So if the vector's capacity() > size() then it's guaranteed to work, otherwise it's guaranteed to be undefined behavior.
I know that generally the standard places few requirements on the values which have been moved from:
N3485 17.6.5.15 [lib.types.movedfrom]/1:
Objects of types defined in the C++ standard library may be moved from (12.8). Move operations may
be explicitly specified or implicitly generated. Unless otherwise specified, such moved-from objects shall be placed in a valid but unspecified state.
I can't find anything about vector that explicitly excludes it from this paragraph. However, I can't come up with a sane implementation that would result in the vector being not empty.
Is there some standardese that entails this that I'm missing or is this similar to treating basic_string as a contiguous buffer in C++03?
I'm coming to this party late, and offering an additional answer because I do not believe any other answer at this time is completely correct.
Question:
Is a moved-from vector always empty?
Answer:
Usually, but no, not always.
The gory details:
vector has no standard-defined moved-from state like some types do (e.g. unique_ptr is specified to be equal to nullptr after being moved from). However the requirements for vector are such that there are not too many options.
The answer depends on whether we're talking about vector's move constructor or move assignment operator. In the latter case, the answer also depends on the vector's allocator.
vector<T, A>::vector(vector&& v)
This operation must have constant complexity. That means that there are no options but to steal resources from v to construct *this, leaving v in an empty state. This is true no matter what the allocator A is, nor what the type T is.
So for the move constructor, yes, the moved-from vector will always be empty. This is not directly specified, but falls out of the complexity requirement, and the fact that there is no other way to implement it.
vector<T, A>&
vector<T, A>::operator=(vector&& v)
This is considerably more complicated. There are 3 major cases:
One:
allocator_traits<A>::propagate_on_container_move_assignment::value == true
(propagate_on_container_move_assignment evaluates to true_type)
In this case the move assignment operator will destruct all elements in *this, deallocate capacity using the allocator from *this, move assign the allocators, and then transfer ownership of the memory buffer from v to *this. Except for the destruction of elements in *this, this is an O(1) complexity operation. And typically (e.g. in most but not all std::algorithms), the lhs of a move assignment has empty() == true prior to the move assignment.
Note: In C++11 the propagate_on_container_move_assignment for std::allocator is false_type, but this has been changed to true_type for C++1y (y == 4 we hope).
In case One, the moved-from vector will always be empty.
Two:
allocator_traits<A>::propagate_on_container_move_assignment::value == false
&& get_allocator() == v.get_allocator()
(propagate_on_container_move_assignment evaluates to false_type, and the two allocators compare equal)
In this case, the move assignment operator behaves just like case One, with the following exceptions:
The allocators are not move assigned.
The decision between this case and case Three happens at run time, and case Three requires more of T, and thus so does case Two, even though case Two doesn't actually execute those extra requirements on T.
In case Two, the moved-from vector will always be empty.
Three:
allocator_traits<A>::propagate_on_container_move_assignment::value == false
&& get_allocator() != v.get_allocator()
(propagate_on_container_move_assignment evaluates to false_type, and the two allocators do not compare equal)
In this case the implementation can not move assign the allocators, nor can it transfer any resources from v to *this (resources being the memory buffer). In this case, the only way to implement the move assignment operator is to effectively:
typedef move_iterator<iterator> Ip;
assign(Ip(v.begin()), Ip(v.end()));
That is, move each individual T from v to *this. The assign can reuse both capacity and size in *this if available. For example if *this has the same size as v the implementation can move assign each T from v to *this. This requires T to be MoveAssignable. Note that MoveAssignable does not require T to have a move assignment operator. A copy assignment operator will also suffice. MoveAssignable just means T has to be assignable from an rvalue T.
If the size of *this is not sufficient, then new T will have to be constructed in *this. This requires T to be MoveInsertable. For any sane allocator I can think of, MoveInsertable boils down to the same thing as MoveConstructible, which means constructible from an rvalue T (does not imply the existence of a move constructor for T).
In case Three, the moved-from vector will in general not be empty. It could be full of moved-from elements. If the elements don't have a move constructor, this could be equivalent to a copy assignment. However, there is nothing that mandates this. The implementor is free to do some extra work and execute v.clear() if he so desires, leaving v empty. I am not aware of any implementation doing so, nor am I aware of any motivation for an implementation to do so. But I don't see anything forbidding it.
David Rodríguez reports that GCC 4.8.1 calls v.clear() in this case, leaving v empty. libc++ does not, leaving v not empty. Both implementations are conforming.
While it might not be a sane implementation in the general case, a valid implementation of the move constructor/assignment is just copying the data from the source, leaving the source untouched. Additionally, for the case of assignment, move can be implemented as swap, and the moved-from container might contain the old value of the moved-to container.
Implementing move as copy can actually happen if you use polymorphic allocators, as we do, and the allocator is not deemed to be part of the value of the object (and thus, assignment never changes the actual allocator being used). In this context, a move operation can detect whether both the source and the destination use the same allocator. If they use the same allocator the move operation can just move the data from the source. If they use different allocators then the destination must copy the source container.
In a lot of situations, move-construction and move-assignment can be implemented by delegating to swap - especially if no allocators are involved. There are several reasons for doing that:
swap has to be implemented anyway
developer efficiency because less code has to be written
runtime efficiency because fewer operations are executed in total
Here is an example for move-assignment. In this case, the move-from vector will not be empty, if the moved-to vector was not empty.
auto operator=(vector&& rhs) -> vector&
{
if (/* allocator is neither move- nor swap-aware */) {
swap(rhs);
} else {
...
}
return *this;
}
I left comments to this effect on other answers, but had to rush off before fully explaining. The result of a moved-from vector must always be empty, or in the case of move assignment, must be either empty or the previous object's state (i.e. a swap), because otherwise the iterator invalidation rules cannot be met, namely that a move does not invalidate them. Consider:
std::vector<int> move;
std::vector<int>::iterator it;
{
std::vector<int> x(some_size);
it = x.begin();
move = std::move(x);
}
std::cout << *it;
Here you can see that iterator invalidation does expose the implementation of the move. The requirement for this code to be legal, specifically that the iterator remains valid, prevents the implementation from performing a copy, or small-object-storage or any similar thing. If a copy was made, then it would be invalidated when the optional is emptied, and the same is true if the vector uses some kind of SSO-based storage. Essentially, the only reasonable possible implementation is to swap pointers, or simply move them.
Kindly view the Standard quotes on requirements for all containers:
X u(rv)
X u = rv
post: u shall be equal to the value that rv had before this construction
a = rv
a shall be equal to the value that rv had before this assignment
Iterator validity is part of the value of a container. Although the Standard does not unambiguously state this directly, we can see in, for example,
begin() returns an iterator referring to the first element in the
container. end() returns an iterator which is the past-the-end value
for the container. If the container is empty, then begin() == end();
Any implementation which actually did move from the elements of the source instead of swapping the memory would be defective, so I suggest that any Standard wordings saying otherwise is a defect- not least of which because the Standard is not in fact very clear on this point. These quotes are from N3691.
If I swap two vectors, will their iterators remain valid, now just pointing to the "other" container, or will the iterator be invalidated?
That is, given:
using namespace std;
vector<int> x(42, 42);
vector<int> y;
vector<int>::iterator a = x.begin();
vector<int>::iterator b = x.end();
x.swap(y);
// a and b still valid? Pointing to x or y?
It seems the std mentions nothing about this:
[n3092 - 23.3.6.2]
void swap(vector<T,Allocator>& x);
Effects:
Exchanges the contents and capacity()
of *this with that of x.
Note that since I'm on VS 2005 I'm also interested in the effects of iterator debug checks etc. (_SECURE_SCL)
The behavior of swap has been clarified considerably in C++11, in large part to permit the Standard Library algorithms to use argument dependent lookup (ADL) to find swap functions for user-defined types. C++11 adds a swappable concept (C++11 §17.6.3.2[swappable.requirements]) to make this legal (and required).
The text in the C++11 language standard that addresses your question is the following text from the container requirements (§23.2.1[container.requirements.general]/8), which defines the behavior of the swap member function of a container:
Every iterator referring to an element in one container before the swap shall refer to the same element in the other container after the swap.
It is unspecified whether an iterator with value a.end() before the swap will have value b.end() after the swap.
In your example, a is guaranteed to be valid after the swap, but b is not because it is an end iterator. The reason end iterators are not guaranteed to be valid is explained in a note at §23.2.1/10:
[Note: the end() iterator does not refer to any element, so it may be
invalidated. --end note]
This is the same behavior that is defined in C++03, just substantially clarified. The original language from C++03 is at C++03 §23.1/10:
no swap() function invalidates any references, pointers, or iterators referring to the elements of the containers being swapped.
It's not immediately obvious in the original text, but the phrase "to the elements of the containers" is extremely important, because end() iterators do not point to elements.
Swapping two vectors does not invalidate the iterators, pointers, and references to its elements (C++03, 23.1.11).
Typically the iterator would contain knowledge of its container, and the swap operation maintains this for a given iterator.
In VC++ 10 the vector container is managed using this structure in <xutility>, for example:
struct _Container_proxy
{ // store head of iterator chain and back pointer
_Container_proxy()
: _Mycont(0), _Myfirstiter(0)
{ // construct from pointers
}
const _Container_base12 *_Mycont;
_Iterator_base12 *_Myfirstiter;
};
All iterators that refer to the elements of the containers remain valid
As for Visual Studio 2005, I have just tested it.
I think it should always work, as the vector::swap function even contains an explicit step to swap everything:
// vector-header
void swap(_Myt& _Right)
{ // exchange contents with _Right
if (this->_Alval == _Right._Alval)
{ // same allocator, swap control information
#if _HAS_ITERATOR_DEBUGGING
this->_Swap_all(_Right);
#endif /* _HAS_ITERATOR_DEBUGGING */
...
The iterators point to their original elements in the now-swapped vector object. (I.e. w/rg to the OP, they first pointed to elements in x, after the swap they point to elements in y.)
Note that in the n3092 draft the requirement is laid out in §23.2.1/9 :
Every iterator referring to an
element in one container before the
swap shall refer to the same element
in the other container after the swap.