Persistant references in STL Containers - c++

When using C++ STL containers, under what conditions must reference values be accessed?
For example are any references invalidated after the next function call to the container?
{
std::vector<int> vector;
vector.push_back (1);
vector.push_back (2);
vector.push_back (3);
vector[0] = 10; //modifies 0'th element
int& ref = vector[0];
ref = 10; //modifies 0'th element
vector.push_back (4);
ref = 20; //modifies 0'th element???
vector.clear ();
ref = 30; //clearly obsurd
}
I understand that in most implementations of the stl this would work, but I'm interested in what the standard declaration requires.
--edit:
Im interested becuase I wanted to try out the STXXL (http://stxxl.sourceforge.net/) library for c++, but I realised that the references returned by the containers were not persistent over multiple reads, and hence not compatible without making changes (however superficial) to my existing stl code. An example:
{
std::vector<int> vector;
vector.push_back (1);
vector.push_back (2);
int& refA = vector[0];
int& refB = vector[1]; //refA is not gaurenteed to be valid anymore
}
I just wanted to know if this meant that STXXL containers where not 100% compatible, or indeed if I had been using STL containers in an unsafe/implementation dependant way the whole time.

About inserting into vectors, the standard says in 23.2.4.3/1:
[insert()] causes reallocation if the
new size is greater than the old
capacity. If no reallocation happens,
all the iterators and references
before the insertion point remain
valid.
(Although this in fact this talks about insert(), Table 68 indicates that a.push_back(x) must be equivalent to a.insert(a.end(), x) for any vector a and value x.) This means that if you reserve() enough memory beforehand, then (and only then) iterators and references are guaranteed not to be invalidated when you insert() or push_back() more items.
Regarding removing items, 23.2.4.3/3 says:
[erase()] invalidates all the
iterators and references after the
point of the erase.
According to Table 68 and Table 67 respectively, pop_back() and clear() are equivalent to appropriate calls to erase().

Some basic rules for vector:
Reallocation invalidates all
references, pointers, and iterators
for elements of the vector.
Insertions may invalidate references,
pointers, and iterators.
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
elements.
If an insertion causes reallocation,
it invalidates all references,
iterators, and pointers.

I expect that references would be invalidated only by any explicit or implicit resize() (see also the max_size, capacity, and reserve methods).

Vector will invalidate its iterator and references when it reallocates, which depends upon its current capacity. Although the above code might work in some cases, you shouldn't rely on this as the reference might be invalidated after the push_back(4) call.

Related

Is it safe to reference a value in unordered_map

For example:
#include <unordered_map>
class A{};
std::unordered_map<unsigned int, A> map {{0,{}},{1, {}},{2, {}}};
int main() {
A& a1 = map[1];
// some insert and remove operations ( key 1 never removed)
// ....
}
Is it safe to still use a1 to reference the value which key is "1", after a lot of insert operations?
In other word:
since std::vector will move elements if the capacity changed, a reference of it's element is not guarantee to be valid. Is this fact also fits for unordered_map?
Yes, it is safe. From the standard:
22.2.7 Unordered associative containers [unord.req]
The insert and emplace members shall not affect the validity of
references to container elements, but may invalidate all iterators to
the container. The erase members shall invalidate only iterators and
references to the erased elements, and preserve the relative order of
the elements that are not erased.
References are safe, but iterators are not!
Perhaps a "less authoritative" source, but easier to read:
References to elements in the unordered_map container remain valid in
all cases, even after a rehash
https://www.cplusplus.com/reference/unordered_map/unordered_map/operator[]/
after a lot of insert operations?
If you meant insert or emplace, then it's fine.
(emphasis mine)
If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated. Rehashing occurs only if the new number of elements is greater than max_load_factor()*bucket_count(). If the insertion is successful, pointers and references to the element obtained while it is held in the node handle are invalidated, and pointers and references obtained to that element before it was extracted become valid. (since C++17)

When no iterators are invalidated, does this include the end iterator?

A std::maps iterators stay valid when inserting elements, eg:
std::map<std::string,int> my_map;
my_map["foo"] = 1;
my_map["bar"] = 2;
auto it_foo = my_map.find("foo");
auto it_bar = my_map.find("bar");
my_map["foobar"] = 3;
after inserting another element (in the last line) the two iterators are still valid. How about the end ? For example:
auto it_end = my_map.find("something that isnt in the map");
my_map["barfoo"] = 4; // does not invalidate iterators
assert(it_end == my_map.end()); // ??
In other words: If a method does not invalidate iterators (other than those explicitly mentioned, as for example in case of map::erase) does this mean that also the end is guaranteed to be the same before as after calling the method?
PS: I am aware that I could just try and see, but this wont tell me whether I can rely on this behaviour.
PPS: For example pushing into a std::vector invalidates all iterators, or only the end (when no reallocation took place), but in this case the docs explicitly mention the end. Following this reasoning, "no iterators are invalidated" should include end, but I am not 100% convinced ;)
N4140 23.2.4 Associative containers [associative.reqmts][1]
9 The insert and emplace members shall not affect the validity of iterators and references to the container, and the erase members shall invalidate only iterators and references to the erased elements.
Definitely the term iterators refers to all iterators including end.

push_backs to a std::vector<std::reference_wrapper<type>>

Consider this attempt at push_backs to a std::vector of std::reference_wrappers:
#include <iostream>
#include <vector>
#include <functional>
int main()
{
std::vector<int> v_i;
std::vector<std::reference_wrapper<int>> s_i;
for(int i=0;i<10;++i)
{
v_i.push_back(i);
s_i.push_back(std::ref(v_i[i]));
}
std::cout<<v_i[0]<<std::endl;
std::cout<<s_i[0].get()<<std::endl;
return -1;
}
I expect the [] operator to return a reference to the i-th element of v, and from the possible implementation given here, we can reasonably assume that the std::reference_wrapper object that is appended to s_i holds a copy of the pointer that points to the correct address of v_i[i]. However, the output of the code above is
0
1980603512 //or some other random garbage value
So obviously, the std::reference_wrappers that are constructed inside the loop are pointing to temporary objects. Why does this happen, and what is the correct way of appending to s_i?
By the way, I am using g++ 5.4.0 (with the -std=c++0x flag).
we can reasonably assume that the std::reference_wrapper object that
is appended to s_i holds a copy of the pointer that points to the
correct address of v_i[i]
No, you can't reasonably assume that. This is because any
v_i.push_back(i);
can result in reallocation of v_i; and in fact you're almost guaranteed that this is going to happen here, at some point. And any reallocation automatically, and immediately, invalidates all existing iterators and pointers to the existing contents of a vector.
This is no different if you saved any plain pointers to some elements in the std::vector, then attempted to push_back(), causing reallocation and invalidation of those pointers. Any subsequent dereference of those pointers results in undefined behavior.
What the shown code is doing here is substantively equivalent, just using an extra onion layer of std::ref to wrap the whole thing, but resulting in logically equivalent undefined behavior.
If you wish to avoid undefined behavior here, the only practical way to do so is to reserve() the vector sufficiently, as to guarantee that no reallocation will happen by the subsequent push_back()s.
The thing is that push_back operation potentially invalidates iterators\references\pointers to the std::vector elements.
From push_back:
If the new size() is greater than capacity() then all iterators and
references (including the past-the-end iterator) are invalidated.
Otherwise only the past-the-end iterator is invalidated.
As your std::vector is growing, there is a moment in time when the new size is greater then the current capacity. As described above, it leads to the invalidation of references\pointers to the std::vector elements.
In this particular case, std::reference_wrappers are invalidated.
Further usage of invalidated std::reference_wrappers causes UB.

std::list - are the iterators invalidated on move?

std::list iterators have some very nice properties - they remain valid when any other element is removed, when a new element is added and even when 2 lists are swapped (Iterator invalidation rules)!
Considering following code behaviour and that the iterators are implement by a form of pointer to the actual node which doesn't change when the list is moved, my guess is that the iterators are still valid in the new container when a std::list is moved, but also I can be in the UB area here by accessing invalid memory which actually has the "expected" value.
std::list<int> l1{3, 2, 1};
std::list<int> l2;
auto it = std::prev(l1.end());
std::cout<<l1.size()<<" "<<l2.size()<<" "<<*it<<std::endl;
l2 = std::move(l1);
std::cout<<l2.size()<<" "<<*it<<std::endl;
3 0 1
3 1
Is it guaranteed by the standard if the iterators remain valid when std::list is moved? What about other containers?
For containers in general, only swap guarantees that iterators remain valid (and point into the swapped containers).
For std::list, the special member function splice() guarantees that iterators retain their expected meaning.
In general, constructing a container from an rvalue doesn't make guarantees about iterators; the only general requirement is that the new container has the "same value" as the container it was constructed from had originally.
(You can imagine debug iterator implementations that store a reference to the container, and that reference would become dangling after a move.)

Does pop_back() really invalidate *all* iterators on an std::vector?

std::vector<int> ints;
// ... fill ints with random values
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
{
*it = ints.back();
ints.pop_back();
continue;
}
it++;
}
This code is not working because when pop_back() is called, it is invalidated. But I don't find any doc talking about invalidation of iterators in std::vector::pop_back().
Do you have some links about that?
The call to pop_back() removes the last element in the vector and so the iterator to that element is invalidated. The pop_back() call does not invalidate iterators to items before the last element, only reallocation will do that. From Josuttis' "C++ Standard Library Reference":
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.
Here is your answer, directly from The Holy Standard:
23.2.4.2 A vector satisfies all of the requirements of a container and of a reversible container (given in two tables in 23.1) and of a sequence, including most of the optional sequence requirements (23.1.1).
23.1.1.12 Table 68
expressiona.pop_back()
return typevoid
operational semanticsa.erase(--a.end())
containervector, list, deque
Notice that a.pop_back is equivalent to a.erase(--a.end()). Looking at vector's specifics on erase:
23.2.4.3.3 - iterator erase(iterator position) - effects - Invalidates all the iterators and references after the point of the erase
Therefore, once you call pop_back, any iterators to the previously final element (which now no longer exists) are invalidated.
Looking at your code, the problem is that when you remove the final element and the list becomes empty, you still increment it and walk off the end of the list.
(I use the numbering scheme as used in the C++0x working draft, obtainable here
Table 94 at page 732 says that pop_back (if it exists in a sequence container) has the following effect:
{ iterator tmp = a.end();
--tmp;
a.erase(tmp); }
23.1.1, point 12 states that:
Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container
member function or passing a container as an argument to a library function shall not invalidate iterators to, or change
the values of, objects within that container.
Both accessing end() as applying prefix-- have no such effect, erase() however:
23.2.6.4 (concerning vector.erase() point 4):
Effects: Invalidates iterators and references at or after the point of the erase.
So in conclusion: pop_back() will only invalidate an iterator to the last element, per the standard.
Here is a quote from SGI's STL documentation (http://www.sgi.com/tech/stl/Vector.html):
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
I think it follows that pop_back only invalidates the iterator pointing at the last element and the end() iterator. We really need to see the data for which the code fails, as well as the manner in which it fails to decide what's going on. As far as I can tell, the code should work - the usual problem in such code is that removal of element and ++ on iterator happen in the same iteration, the way #mikhaild points out. However, in this code it's not the case: it++ does not happen when pop_back is called.
Something bad may still happen when it is pointing to the last element, and the last element is less than 10. We're now comparing an invalidated it and end(). It may still work, but no guarantees can be made.
Iterators are only invalidated on reallocation of storage. Google is your friend: see footnote 5.
Your code is not working for other reasons.
pop_back() invalidates only iterators that point to the last element. From C++ Standard Library Reference:
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.
So to answer your question, no it does not invalidate all iterators.
However, in your code example, it can invalidate it when it is pointing to the last element and the value is below 10. In which case Visual Studio debug STL will mark iterator as invalidated, and further check for it not being equal to end() will show an assert.
If iterators are implemented as pure pointers (as they would in probably all non-debug STL vector cases), your code should just work. If iterators are more than pointers, then your code does not handle this case of removing the last element correctly.
Error is that when "it" points to the last element of vector and if this element is less than 10, this last element is removed. And now "it" points to ints.end(), next "it++" moves pointer to ints.end()+1, so now "it" running away from ints.end(), and you got infinite loop scanning all your memory :).
The "official specification" is the C++ Standard. If you don't have access to a copy of C++03, you can get the latest draft of C++0x from the Committee's website: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2723.pdf
The "Operational Semantics" section of container requirements specifies that pop_back() is equivalent to { iterator i = end(); --i; erase(i); }. the [vector.modifiers] section for erase says "Effects: Invalidates iterators and references at or after the point of the erase."
If you want the intuition argument, pop_back is no-fail (since destruction of value_types in standard containers are not allowed to throw exceptions), so it cannot do any copy or allocation (since they can throw), which means that you can guess that the iterator to the erased element and the end iterator are invalidated, but the remainder are not.
pop_back() will only invalidate it if it was pointing to the last item in the vector. Your code will therefore fail whenever the last int in the vector is less than 10, as follows:
*it = ints.back(); // Set *it to the value it already has
ints.pop_back(); // Invalidate the iterator
continue; // Loop round and access the invalid iterator
You might want to consider using the return value of erase instead of swapping the back element to the deleted position an popping back. For sequences erase returns an iterator pointing the the element one beyond the element being deleted. Note that this method may cause more copying than your original algorithm.
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
it = ints.erase( it );
else
++it;
}
std::remove_if could also be an alternative solution.
struct LessThanTen { bool operator()( int n ) { return n < 10; } };
ints.erase( std::remove_if( ints.begin(), ints.end(), LessThanTen() ), ints.end() );
std::remove_if is (like my first algorithm) stable, so it may not be the most efficient way of doing this, but it is succinct.
Check out the information here (cplusplus.com):
Delete last element
Removes the last element in the vector, effectively reducing the vector size by one and invalidating all iterators and references to it.