Don't understand iterator, reference and pointer invalidation, an example

Don't understand iterator, reference and pointer invalidation, an example - c++

I've read a lot of posts abut reference, pointers and iterators invalidation. For instance I've read that insertion invalidates all reference to the elements of a deque, then why in the following code I don't have errors?
#include <deque>
int main()
{
std::deque<int> v1 = { 1, 3, 4, 5, 7, 8, 9, 1, 3, 4 };
int& a = v1[6];
std::deque<int>::iterator it = v1.insert(v1.begin() + 2, 3);
int c = a;
return a;
}
When I run this I get 9 as result, so "a" is still referring to the right element.
In general I didn't manage to get invalidation errors. I tried different containers and even with pointers and iterators.

Sometimes, an operation that could invalidate something, doesn't.
I'm not familiar enough with std::deque implementation to comment, but if you did push_back on a std::vector, for example, you might get all your iterators, references and pointers to elements of the vector invalidated, for example, because std::vector needed to allocate more memory to accomodate the new element, and ended up moving all the data to a new location, where that memory was available.
Or, you might get nothing invalidated, because the vector had enough space to just construct a new element in place, or was lucky enough to get enough new memory at the end of its current memory location, and did not have to move anything, while still having changed size.
Usually, the documentation carefully documents what operations can invalidate what. For example, search for "invalidate" in https://en.cppreference.com/w/cpp/container/deque .
Additionally, particular implementations of the standard data structures might be even safer than the standard guarantees - but relying on that will make your code highly non-portable, and potentially introduce hidden bugs when the unspoken safety guarantees change: everything will seem to work just fine until it doesn't.
The only safe thing to do is to read the specification carefully and never rely on something not getting invalidated when it does not guarantee that.
Also, as Enrico pointed out, you might get cases where your references/pointers/iterators get invalidated, but reading from them yields a value that looks fine, so such a simple method for testing if something has been invalidated will not do.

The following code, on my system, shows the effect of the undefined behavior.
#include <deque>
#include <iostream>
int main()
{
std::deque<int> v1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
for (auto e : v1) std::cout << e << ' ';
std::cout << std::endl;
int& a = v1[1];
int& b = v1[2];
int& c = v1[3];
std::cout << a << ' ' << b << ' ' << c << std::endl;
std::deque<int>::iterator it = v1.insert(v1.begin() + 2, -1);
for (auto e : v1) std::cout << e << ' ';
std::cout << std::endl;
v1[7] = -3;
std::cout << a << ' ' << b << ' ' << c << std::endl;
return a;
}
Its output for me is:
1 2 3 4 5 6 7 8 9 10
2 3 4
1 2 -1 3 4 5 6 7 8 9 10
-1 3 4
If the references a, b, and c, were still valid, the last line should have been
2 3 4
Please, do not deduce from this that a has been invalidated while b and c are still valid. They're all invalid.
Try it out, maybe you are "lucky" and it shows the same to you. If it doesn't, play around with the number of elements in the containar and a few insertions. At some point maybe you'll see something strange as in my case.
Addendum
The ways std::deques can be implemented all makes the invalidation mechanism a bit more complex than what happens for the "simpler" std::vector. And you also have less ways to check if something is actually going to suffer from the effect of undefined behavior. With std::vector, for instance, you can tell if undefined behavior will sting you upon a push_back; indeed, you have the member function capacity, which tells if the container has already enough space to accomodate a bigger size required by the insertion of further elements by means of push_back. For instance if size gives 8, and capacity gives 10, you can push_back two more elements "safely". If you push one more, the array will have to be reallocated.

Related

Is there something similar to vectors push_back function that you can do with normal arrays?

My question is that our professor doesn't allow us to use vectors. So in a case where I don't know how many times user is going to enter something how do I implement that using normal arrays in c++
I am trying to do this with normal arrays
vector<int> x;
x.push_back(users_input);
user_input could be an integer user enters
then I should also be able to find it's size
int size = end(x) - begin(x);
MY requirement is that I want to add values to this array only when program falls into that switch choice which means I am not sure how many times user decides to come into that switch choice.
I can make an array and initialize it much largely.
int arr[1000];
But then how do I find the number of elements in it because if I use sizeof it will give me '1000' not the amount of elements that have gotten stored?
Any help? Thank you!

My question is that our professor doesn't allow us to use vectors.
You might send your professor the following link: Kate Gregory: Stop Teaching C. (Maybe, do it after you passed the course.)
That said, if you are forced to use a C array, you have to consider two things:
Allocate the maximum storage which is sufficient for the expected use case.
You have to track the number of currently used elements in an extra variable e.g. size_t n.
Example:
#include <iostream>
#include <algorithm>
void print(int arr[], size_t n)
{
for (size_t i = 0; i < n; ++i) std::cout << ' ' << arr[i];
std::cout << '\n';
}
int main()
{
// 10 is my optimistic assumption to be sufficient large.
int arr[10] = { 1, 3, 5, 4, 2 };
// remember that 5 elements are in use initially
size_t n = 5;
// show array
print(arr, n);
// use address of elements where iterators are required e.g. in std::sort
std::sort(&arr[0], &arr[n]);
// show sorted array
print(arr, n);
// how to apply something like std::vector::push_back()
arr[n++] = -1;
arr[n++] = 10;
arr[n++] = 7;
// show array again
print(arr, n);
// use address of elements where iterators are required (alternative form)
std::sort(arr, arr + n);
// show sorted array again
print(arr, n);
}
Output:
1 3 5 4 2
1 2 3 4 5
1 2 3 4 5 -1 10 7
-1 1 2 3 4 5 7 10
Live Demo on coliru
Something which must be kept in mind: n may not become larger than the size of the array. Otherwise, read/write access would result in Undefined Behavior.
It couldn't hurt to check the size of n before applying arr[n++] = …
As 463035818_is_not_a_number commented:
Another alternative is, of course, to manage dynamic memory with new[] (and delete[]), which I consider as a pain. Actually, I was under the impression that the C++ standard's consortium did a lot of effort to free the application developers from that pain.
Nevertheless, a lot of professors seem to insist in the fact that the students have to learn this. (I don't like to justice whether it's good or bad. – I myself learnt it that way due to the lack of alternatives in the past.)
However, this site is full of questions with failed attempts…
I preferred to present a simple alternative – according to do things never more complicated than required (also known as KISS principle).

std::copy, std::copy_backward and overlapping ranges

My references are to std::copy and std::copy_backward.
template< class InputIt, class OutputIt > OutputIt copy( InputIt
first, InputIt last, OutputIt d_first );
Copies all elements in the range [first, last) starting from first and
proceeding to last - 1. The behavior is undefined if d_first is within
the range [first, last). In this case, std::copy_backward may be used
instead.
template< class BidirIt1, class BidirIt2 > BidirIt2 copy_backward(
BidirIt1 first, BidirIt1 last, BidirIt2 d_last )
Copies the elements from the range, defined by [first, last), to
another range ending at d_last. The elements are copied in reverse
order (the last element is copied first), but their relative order is
preserved.
The behavior is undefined if d_last is within (first, last]. std::copy
must be used instead of std::copy_backward in that case.
When copying overlapping ranges, std::copy is appropriate when copying
to the left (beginning of the destination range is outside the source
range) while std::copy_backward is appropriate when copying to the
right (end of the destination range is outside the source range).
From the above description, I gather the following inference:
Both copy and copy_backward end up copying the same source range [first, last) to the destination range, albeit in the case of the former the copying occurs from first to last - 1, whereas in the case of the latter the copying occurs from last -1 to first. In both cases, relative order of elements in the source range is preserved in the resulting destination range.
However, what is the technical reason behind the following two stipulations:
1) In the case of copy, undefined behavior results (implying unsuccessful copying of the source range to the destination range and possibly system fault) if d_first is within the range [first, last).
2) In the case of copy_backward, undefined behavior results (implying unsuccessful copying of the source range to the destination range and possibly system fault) if d_last is within the range (first, last].
I am assuming that the suggestion to replace copy with copy_backward to avert the above undefined behavior scenario, would become evident to me once I understand the implication of the above two statements.
Likewise, I am also assuming that the mention about the appropriateness of copy when copying to the left (which notion is not clear to me), and copy_backward when copying to the right (which notion is not clear to me either), would begin to make sense once I comprehend the above distinction between copy and copy_backward.
Look forward to your helpful thoughts as always.
Addendum
As a follow-up, I wrote the following test code to verify the behavior of both copy and copy_backward, for identical operation.
#include <array>
#include <algorithm>
#include <cstddef>
#include <iostream>
using std::array;
using std::copy;
using std::copy_backward;
using std::size_t;
using std::cout;
using std::endl;
int main (void)
{
const size_t sz = 4;
array<int,sz>a1 = {0,1,2,3};
array<int,sz>a2 = {0,1,2,3};
cout << "Array1 before copy" << endl;
cout << "==================" << endl;
for(auto&& i : a1) //the type of i is int&
{
cout << i << endl;
}
copy(a1.begin(),a1.begin()+3,a1.begin()+1);
cout << "Array1 after copy" << endl;
cout << "=================" << endl;
for(auto&& i : a1) //the type of i is int&
{
cout << i << endl;
}
cout << "Array2 before copy backward" << endl;
cout << "===========================" << endl;
for(auto&& i : a2) //the type of i is int&
{
cout << i << endl;
}
copy_backward(a2.begin(),a2.begin()+3,a2.begin()+1);
cout << "Array2 after copy backward" << endl;
cout << "==========================" << endl;
for(auto&& i : a2) //the type of i is int&
{
cout << i << endl;
}
return (0);
}
The following is the program output:
Array1 before copy
==================
0
1
2
3
Array1 after copy
=================
0
0
1
2
Array2 before copy backward
===========================
0
1
2
3
Array2 after copy backward
==========================
2
1
2
3
Evidently, copy produces the expected result, whereas copy_backward doesn't, even though d_first is within the range [first, last). Additionally, d_last is within the range (first, last] as well, which should result in undefined behavior in the case of copy_backward as per the documentation.
So in effect, the program output is in accordance with the documentation in the case of copy_backward, whereas it is not in the case of copy.
It is worth noting again that in both cases, d_first and d_last do satisfy the condition which should result in undefined behavior for both copy and copy_backward respectively, as per documentation. However, the undefined behavior is observed only in the case of copy_backward.

There is nothing deep going on here. Just do an algorithm run-through with sample data using a naive approach: copy each element in order.
Suppose you have the four-element array int a[4] = {0, 1, 2, 3} and you want to copy the first three elements to the last three. Ideally, you would end up with {0, 0, 1, 2}. How would this (not) work with std::copy(a, a+3, a+1)?
Step 1: Copy the first element a[1] = a[0]; The array is now {0, 0, 2, 3}.
Step 2: Copy the second element a[2] = a[1]; The array is now {0, 0, 0, 3}.
Step 3: Copy the third element a[3] = a[2]; The array is now {0, 0, 0, 0}.
The result is wrong because you overwrote some of your source data (a[1] and a[2]) before reading those values. Copying in reverse would work because in reverse order, you would read values before overwriting them.
Since the result is wrong with one reasonable approach, the standard declared the behavior "undefined". Compilers wishing to take the naive approach may, and they do not have to account for this case. It is OK to be wrong in this case. Compilers that take a different approach might produce different results, maybe even the "correct" results. That is also OK. Whatever is easiest for the compiler is fine by the standard.
In light of the question's addendum: please note that this is undefined behavior. That does not mean the behavior is defined to be contrary to the programmer's intent. Rather, it means that the behavior is not defined by the C++ standard. It is up to each compiler to decide what happens. The result of std::copy(a, a+3, a+1) could be anything. You might get the naive result of {0, 0, 0, 0}. However, you might instead get the intended result of {0, 0, 1, 2}. Other results are also possible. You cannot conclude that there is no undefined behavior simply because you were lucky enough to get the behavior you intended. Sometimes undefined behavior gives correct results. (That's one reason that tracking down bugs related to undefined behavior can be so difficult.)

The reason is that, in general, copying part of a range to another part of the same range, might require additional (if only temporary) storage, to handle overlaps when copying in sequence from left to right, or from right to left in your second example.
As is common with C++, to avoid forcing implementations to take this extreme step, the standard just tells you not to do it by saying the results are undefined.
This forces you, in such situations, to be explicit by copying into a fresh piece of memory yourself.
It does so while not even requiring the compiler to put any effort into warning or telling you about this, which would also be seen as "too bossy" on the part of the standard.
But your assumption that undefined behaviour here results in a copy failure (or a system fault) is also wrong. I mean, that could well be the result (and JaMiT demonstrates very well how this could occur) but you must not fall into the trap of expecting any particular result from a program with undefined behaviour; that's the point of it. Indeed, some implementation may even go to the trouble of making overlapping range copies "work" (though I'm not aware of any that do).

Working with structure objects

I have a logic that looks like the below (Not the actual code):
StructureElement x;
For i in 1 to 1000
do
x.Elem1 = 20;
x.Elem2 = 30;
push(x into a std::vector)
end
My knowledge is that x be allocated memory only once and that the existing values will be overwritten for every iteration.
Also, the 'x' pushed into the vector will not be affected by subsequent iterations of pushing a modified 'x'.
Am I right in my observations?
Is the above optimal? I would want to keep memory consumption minimal and would not prefer using new. Am I missing anything by not using new?
Also, I pass this vector and recieve a reference to it it another method.
And, if I were to read the vector elements back, is this right?
Structure element xx = mYvector.begin()
print xx.Elem1
print xx.Elem2
Any optimizations or different ideas would be welcome.

Am I right in my observations?
Yes, if the vector is std::vector<StructureElement>, in which case it keeps its own copies if what is pushed in.
Is the above optimal?
It is sub-optimal because it results in many re-allocations of the vector's underlying data buffer, plus unnecessary assignments and copies. The compiler may optimize some of the assignments and copies away, but there is no reason, for example, to re-set the elements of x in the loop.
You can simplify it like this:
std:vector<StructureElement> v(1000, StructureElement{20, 30});
This creates a size-1000 vector containing copies of StructureElement with the desired values, which is what you seem to be trying in your pseudo-code.
To read the elements back, you have options. A range based for-loop if you want to iterate over all elements:
for (const auto& e: v):
std::cout << e.Elem1 << " " << e.Elem2 << std::endl;
Using iterators,
for (auto it = begin(v); it != end(v); ++it)
std::cout << it->Elem1 << it->Elem2 << std::endl;
Or, pass ranges in to algorithms
std::transform(begin(v), end(v), ....);

Iterator returned by set_union()

I have the following C++ code using set_union() from algorithm stl:
9 int first[] = {5, 10, 15, 20, 25};
10 int second[] = {50, 40, 30, 20, 10};
11 vector<int> v(10);
12 vector<int>::iterator it;
13
14 sort(first, first+5);
15 sort(second, second+5);
16
17 it = set_union(first, first + 5, second, second + 5, v.begin());
18
19 cout << int(it - v.begin()) << endl;
I read through the document of set_union from http://www.cplusplus.com/reference/algorithm/set_union/ . I have two questions:
Line 17. I understand set_union() is returning an OutputIterator. I
thought iterators are like an object returned from a container object
(e.g. instantiated vector class, and calling blah.begin()
returns the iterator object). I am trying to understand what does
the "it" returned from set_union point to, which object?
Line 19. What does "it - v.begin()" equate to. I am guessing from the output value of "8", the size of union, but how?
Would really appreciate if someone can shed some light.
Thank you,
Ahmed.

The documentation for set_union states that the returned iterator points past the end of constructed range, in your case to one past the last element in v that was written to by set_union.
This is the reason it - v.begin() results in the length of the set union also. Note that you are able to simply subtract the two only because a vector<T>::iterator must satisfy the RandomAccessIterator concept. Ideally, you should use std::distance to figure out the interval between two iterators.
Your code snippet can be written more idiomatically as follows:
int first[] = {5, 10, 15, 20, 25};
int second[] = {50, 40, 30, 20, 10};
std::vector<int> v;
v.reserve(10); // reserve instead of setting an initial size
sort(std::begin(first), std::end(first));
sort(std::begin(second), std::begin(second));
// use std::begin/end instead of hard coding length
auto it = set_union(std::begin(first), std::end(first),
std::begin(second), std::end(second),
std::back_inserter(v));
// using back_inserter ensures the code works even if the vector is not
// initially set to the right size
std::cout << std::distance(v.begin(), it) << std::endl;
std::cout << v.size() << std::endl;
// these lines will output the same result unlike your example
In response to your comment below
What is the use of creating a vector of size 10 or reserving size 10
In your original example, creating a vector having initial size of at least 8 is necessary to prevent undefined behavior because set_union is going to write 8 elements to the output range. The purpose of reserving 10 elements is an optimization to prevent possibility of multiple reallocations of the vector. This is typically not needed, or feasible since you won't know the size of the result in advance.
I tried with size 1, works fine
Size of 1 definitely does NOT work fine with your code, it is undefined behavior. set_union will write past the end of the vector. You get a seg fault with size 0 for the same reason. There's no point in speculating why the same thing doesn't happen in the first case, that's just the nature of undefined behavior.
Does set_union trim the size of the vector, from 10 to 8. Why or is that how set_union() works
You're only passing an iterator to set_union, it knows nothing about the underlying container. So there's no way it could possibly trim excess elements, or make room for more if needed. It simply keeps writing to the output iterator and increments the iterator after each write. This is why I suggested using back_inserter, that is an iterator adaptor that will call vector::push_back() whenever the iterator is written to. This guarantees that set_union will never write beyond the bounds of the vector.

first: "it" is an iterator to the end of the constructed range (i.e. equivalent to v.end())
second: it - v.begin() equals 8 because vector iterators are usually just typedefed pointers and therefore it is just doing pointer arithmetic. In general, it is better to use the distance algorithm than relying on raw subtraction
cout << distance(v.begin(), it) << endl;

Why does std::vector::insert invalidate all iterators after the insertion point

When insert-ing into a std::vector the C++ standard assures that all iterators before the insertion point remain valid as long as the capacity is not exhausted (see [23.2.4.3/1] or std::vector iterator invalidation).
What is the rationale behind not allowing iterators after the insertion point to remain valid (if the capacity is not exhausted)? Of course, they would then point to a different element but (from the presumed implementation of std::vector) it should still be possible to use such an iterator (for example dereference it or increment it).

You seem to be thinking of an "invalid" iterator as only one that would provoke a crash if used, but the standard's definition is broader. It includes the possibility that the iterator can still safely be dereferenced, but no longer points to the element it is expected to point to. (This is a special case of the observation that "undefined behavior" does not mean "your program will immediately crash"; it can also mean "your program will silently compute the wrong result" or even "nothing observably wrong will occur on this implementation.")
It is easier to demonstrate why this is an issue with erase:
#include <vector>
#include <iostream>
int main(void)
{
std::vector<int> a { 0, 1, 2, 3, 4, 4, 6 };
for (auto p = a.begin(); p != a.end(); p++) // THIS IS WRONG
if (*p == 4)
a.erase(p);
for (auto p = a.begin(); p != a.end(); p++)
std::cout << ' ' << *p;
std::cout << '\n';
}
On typical implementations of C++ this program will not crash, but it will print 0 1 2 3 4 6, rather than 0 1 2 3 6 as probably intended, because erasing the first 4 invalidated p -- by advancing it over the second 4.
Your C++ implementation may have a special "debugging" mode in which this program does crash when run. For instance, with GCC 4.8:
$ g++ -std=c++11 -W -Wall test.cc && ./a.out
0 1 2 3 4 6
but
$ g++ -std=c++11 -W -Wall -D_GLIBCXX_DEBUG test.cc && ./a.out
/usr/include/c++/4.8/debug/safe_iterator.h:307:error: attempt to increment
a singular iterator.
Objects involved in the operation:
iterator "this" # 0x0x7fff5d659470 {
type = N11__gnu_debug14_Safe_iteratorIN9__gnu_cxx17__normal_iteratorIPiNSt9__cxx19986vectorIiSaIiEEEEENSt7__debug6vectorIiS6_EEEE (mutable iterator);
state = singular;
references sequence with type `NSt7__debug6vectorIiSaIiEEE' # 0x0x7fff5d659470
}
Aborted
Do understand that the program provokes undefined behavior either way. It is just that the consequences of the undefined behavior are more dramatic in the debugging mode.

That the iterators may refer to a different element is enough for them to be invalidated. An iterator is supposed to refer to the same element for the duration of its valid lifetime.
You're right that, in practice, you may not experience any crashing or nasal demons if you were to dereference such an iterator, but that does not make it valid.

A vector grows dynamically, so when you push onto a vector, if there is no space for the item, memory needs to be allocated for it. The Standard mandates that vector must store its elements in contiguous memory, so when memory is allocate, it has to be enough to store ALL the existing elements, plus the new one.
The vector doesn't know about any iterators for itself, so cannot update them into the new storage of elements. Iterators are therefore invalid after the memory has been reallocated.

The vector does not know which iterators exist. Yet the memory location of the elements after the inserted element changed. This means, the iterators need to be updated to reflect that change should they remain valid. But the vector cannot do this update, because it does not know which iterators exist.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Don't understand iterator, reference and pointer invalidation, an example - c++

Related

Is there something similar to vectors push_back function that you can do with normal arrays?

std::copy, std::copy_backward and overlapping ranges

Working with structure objects

Iterator returned by set_union()

Why does std::vector::insert invalidate all iterators after the insertion point

Categories

Resources