std::copy, std::copy_backward and overlapping ranges - c++

My references are to std::copy and std::copy_backward.
template< class InputIt, class OutputIt > OutputIt copy( InputIt
first, InputIt last, OutputIt d_first );
Copies all elements in the range [first, last) starting from first and
proceeding to last - 1. The behavior is undefined if d_first is within
the range [first, last). In this case, std::copy_backward may be used
instead.
template< class BidirIt1, class BidirIt2 > BidirIt2 copy_backward(
BidirIt1 first, BidirIt1 last, BidirIt2 d_last )
Copies the elements from the range, defined by [first, last), to
another range ending at d_last. The elements are copied in reverse
order (the last element is copied first), but their relative order is
preserved.
The behavior is undefined if d_last is within (first, last]. std::copy
must be used instead of std::copy_backward in that case.
When copying overlapping ranges, std::copy is appropriate when copying
to the left (beginning of the destination range is outside the source
range) while std::copy_backward is appropriate when copying to the
right (end of the destination range is outside the source range).
From the above description, I gather the following inference:
Both copy and copy_backward end up copying the same source range [first, last) to the destination range, albeit in the case of the former the copying occurs from first to last - 1, whereas in the case of the latter the copying occurs from last -1 to first. In both cases, relative order of elements in the source range is preserved in the resulting destination range.
However, what is the technical reason behind the following two stipulations:
1) In the case of copy, undefined behavior results (implying unsuccessful copying of the source range to the destination range and possibly system fault) if d_first is within the range [first, last).
2) In the case of copy_backward, undefined behavior results (implying unsuccessful copying of the source range to the destination range and possibly system fault) if d_last is within the range (first, last].
I am assuming that the suggestion to replace copy with copy_backward to avert the above undefined behavior scenario, would become evident to me once I understand the implication of the above two statements.
Likewise, I am also assuming that the mention about the appropriateness of copy when copying to the left (which notion is not clear to me), and copy_backward when copying to the right (which notion is not clear to me either), would begin to make sense once I comprehend the above distinction between copy and copy_backward.
Look forward to your helpful thoughts as always.
Addendum
As a follow-up, I wrote the following test code to verify the behavior of both copy and copy_backward, for identical operation.
#include <array>
#include <algorithm>
#include <cstddef>
#include <iostream>
using std::array;
using std::copy;
using std::copy_backward;
using std::size_t;
using std::cout;
using std::endl;
int main (void)
{
const size_t sz = 4;
array<int,sz>a1 = {0,1,2,3};
array<int,sz>a2 = {0,1,2,3};
cout << "Array1 before copy" << endl;
cout << "==================" << endl;
for(auto&& i : a1) //the type of i is int&
{
cout << i << endl;
}
copy(a1.begin(),a1.begin()+3,a1.begin()+1);
cout << "Array1 after copy" << endl;
cout << "=================" << endl;
for(auto&& i : a1) //the type of i is int&
{
cout << i << endl;
}
cout << "Array2 before copy backward" << endl;
cout << "===========================" << endl;
for(auto&& i : a2) //the type of i is int&
{
cout << i << endl;
}
copy_backward(a2.begin(),a2.begin()+3,a2.begin()+1);
cout << "Array2 after copy backward" << endl;
cout << "==========================" << endl;
for(auto&& i : a2) //the type of i is int&
{
cout << i << endl;
}
return (0);
}
The following is the program output:
Array1 before copy
==================
0
1
2
3
Array1 after copy
=================
0
0
1
2
Array2 before copy backward
===========================
0
1
2
3
Array2 after copy backward
==========================
2
1
2
3
Evidently, copy produces the expected result, whereas copy_backward doesn't, even though d_first is within the range [first, last). Additionally, d_last is within the range (first, last] as well, which should result in undefined behavior in the case of copy_backward as per the documentation.
So in effect, the program output is in accordance with the documentation in the case of copy_backward, whereas it is not in the case of copy.
It is worth noting again that in both cases, d_first and d_last do satisfy the condition which should result in undefined behavior for both copy and copy_backward respectively, as per documentation. However, the undefined behavior is observed only in the case of copy_backward.

There is nothing deep going on here. Just do an algorithm run-through with sample data using a naive approach: copy each element in order.
Suppose you have the four-element array int a[4] = {0, 1, 2, 3} and you want to copy the first three elements to the last three. Ideally, you would end up with {0, 0, 1, 2}. How would this (not) work with std::copy(a, a+3, a+1)?
Step 1: Copy the first element a[1] = a[0]; The array is now {0, 0, 2, 3}.
Step 2: Copy the second element a[2] = a[1]; The array is now {0, 0, 0, 3}.
Step 3: Copy the third element a[3] = a[2]; The array is now {0, 0, 0, 0}.
The result is wrong because you overwrote some of your source data (a[1] and a[2]) before reading those values. Copying in reverse would work because in reverse order, you would read values before overwriting them.
Since the result is wrong with one reasonable approach, the standard declared the behavior "undefined". Compilers wishing to take the naive approach may, and they do not have to account for this case. It is OK to be wrong in this case. Compilers that take a different approach might produce different results, maybe even the "correct" results. That is also OK. Whatever is easiest for the compiler is fine by the standard.
In light of the question's addendum: please note that this is undefined behavior. That does not mean the behavior is defined to be contrary to the programmer's intent. Rather, it means that the behavior is not defined by the C++ standard. It is up to each compiler to decide what happens. The result of std::copy(a, a+3, a+1) could be anything. You might get the naive result of {0, 0, 0, 0}. However, you might instead get the intended result of {0, 0, 1, 2}. Other results are also possible. You cannot conclude that there is no undefined behavior simply because you were lucky enough to get the behavior you intended. Sometimes undefined behavior gives correct results. (That's one reason that tracking down bugs related to undefined behavior can be so difficult.)

The reason is that, in general, copying part of a range to another part of the same range, might require additional (if only temporary) storage, to handle overlaps when copying in sequence from left to right, or from right to left in your second example.
As is common with C++, to avoid forcing implementations to take this extreme step, the standard just tells you not to do it by saying the results are undefined.
This forces you, in such situations, to be explicit by copying into a fresh piece of memory yourself.
It does so while not even requiring the compiler to put any effort into warning or telling you about this, which would also be seen as "too bossy" on the part of the standard.
But your assumption that undefined behaviour here results in a copy failure (or a system fault) is also wrong. I mean, that could well be the result (and JaMiT demonstrates very well how this could occur) but you must not fall into the trap of expecting any particular result from a program with undefined behaviour; that's the point of it. Indeed, some implementation may even go to the trouble of making overlapping range copies "work" (though I'm not aware of any that do).

Related

Don't understand iterator, reference and pointer invalidation, an example

I've read a lot of posts abut reference, pointers and iterators invalidation. For instance I've read that insertion invalidates all reference to the elements of a deque, then why in the following code I don't have errors?
#include <deque>
int main()
{
std::deque<int> v1 = { 1, 3, 4, 5, 7, 8, 9, 1, 3, 4 };
int& a = v1[6];
std::deque<int>::iterator it = v1.insert(v1.begin() + 2, 3);
int c = a;
return a;
}
When I run this I get 9 as result, so "a" is still referring to the right element.
In general I didn't manage to get invalidation errors. I tried different containers and even with pointers and iterators.
Sometimes, an operation that could invalidate something, doesn't.
I'm not familiar enough with std::deque implementation to comment, but if you did push_back on a std::vector, for example, you might get all your iterators, references and pointers to elements of the vector invalidated, for example, because std::vector needed to allocate more memory to accomodate the new element, and ended up moving all the data to a new location, where that memory was available.
Or, you might get nothing invalidated, because the vector had enough space to just construct a new element in place, or was lucky enough to get enough new memory at the end of its current memory location, and did not have to move anything, while still having changed size.
Usually, the documentation carefully documents what operations can invalidate what. For example, search for "invalidate" in https://en.cppreference.com/w/cpp/container/deque .
Additionally, particular implementations of the standard data structures might be even safer than the standard guarantees - but relying on that will make your code highly non-portable, and potentially introduce hidden bugs when the unspoken safety guarantees change: everything will seem to work just fine until it doesn't.
The only safe thing to do is to read the specification carefully and never rely on something not getting invalidated when it does not guarantee that.
Also, as Enrico pointed out, you might get cases where your references/pointers/iterators get invalidated, but reading from them yields a value that looks fine, so such a simple method for testing if something has been invalidated will not do.
The following code, on my system, shows the effect of the undefined behavior.
#include <deque>
#include <iostream>
int main()
{
std::deque<int> v1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
for (auto e : v1) std::cout << e << ' ';
std::cout << std::endl;
int& a = v1[1];
int& b = v1[2];
int& c = v1[3];
std::cout << a << ' ' << b << ' ' << c << std::endl;
std::deque<int>::iterator it = v1.insert(v1.begin() + 2, -1);
for (auto e : v1) std::cout << e << ' ';
std::cout << std::endl;
v1[7] = -3;
std::cout << a << ' ' << b << ' ' << c << std::endl;
return a;
}
Its output for me is:
1 2 3 4 5 6 7 8 9 10
2 3 4
1 2 -1 3 4 5 6 7 8 9 10
-1 3 4
If the references a, b, and c, were still valid, the last line should have been
2 3 4
Please, do not deduce from this that a has been invalidated while b and c are still valid. They're all invalid.
Try it out, maybe you are "lucky" and it shows the same to you. If it doesn't, play around with the number of elements in the containar and a few insertions. At some point maybe you'll see something strange as in my case.
Addendum
The ways std::deques can be implemented all makes the invalidation mechanism a bit more complex than what happens for the "simpler" std::vector. And you also have less ways to check if something is actually going to suffer from the effect of undefined behavior. With std::vector, for instance, you can tell if undefined behavior will sting you upon a push_back; indeed, you have the member function capacity, which tells if the container has already enough space to accomodate a bigger size required by the insertion of further elements by means of push_back. For instance if size gives 8, and capacity gives 10, you can push_back two more elements "safely". If you push one more, the array will have to be reallocated.

C++ Unexpected behavior with remove_if

I am trying to use std::remove_if to remove spaces from a simple string, but I am getting weird results. Could someone help me figure out what's going on?
The Code is:
#include <iostream>
#include <algorithm>
#include <string>
int main(int argc, char * argv[])
{
std::string test = "a b";
std::remove_if(test.begin(), test.end(), isspace);
std::cout << "test : " << test << std::endl;
return 0;
}
I expect this to simply print out:
test : ab
but instead I get
test : abb
Trying with another string, I get:
Input: "a bcde uv xy"
Output: "abcdeuvxy xy"
It seems like it duplicating the last "word", but sometimes adds a space. How can I just get it to remove all spaces without doing weird stuff?
std::remove_if performs removing by shifting elements; the removed elements won't be erased from the container in fact. STL algorithms don't have such privilege; only containers can remove their elements.
(emphasis mine)
Removing is done by shifting (by means of move assignment) the
elements in the range in such a way that the elements that are not to
be removed appear in the beginning of the range. Relative order of the
elements that remain is preserved and the physical size of the
container is unchanged. Iterators pointing to an element between the
new logical end and the physical end of the range are still
dereferenceable, but the elements themselves have unspecified values
(as per MoveAssignable post-condition). A call to remove is typically
followed by a call to a container's erase method, which erases the
unspecified values and reduces the physical size of the container to
match its new logical size.
You can erase the removed elements afterward (which is known as erase–remove idiom).
test.erase(std::remove_if(test.begin(), test.end(), isspace), test.end());

issues with deques and exceeding deque size with index operator

Having a strange issue with deques in C++.
Let's say I have a deque of doubles of size 4. For some reason, when using the index operator, I seem to be able to exceed the size of the deque.
In other words, neither the compiler nor the program at execution will barf if I write the following:
for(int i = 0; i < 7; i++)
{
x[i] = (double)(i*i);
cout << x[i] << endl;
}
Where x is the deque. And I actually am able to get outputs from this.
It doesn't increase the size of the deque. If I output x.size(), I still get 4.
What gives?
I'm using Code::Blocks with the standard default gcc compiler that comes with it.
operator[] does not bounds check, just like when using a raw array. the at member function does, if you instead use
x.at(i);
you will get a std::out_of_range exception if you exceed the bounds of the deque. If you run your original code through a memory error checker (like valgrind) you will see "invalid read" and "invalid write" errors.
If you look at cppreference's docs on operator[] you'll see the note "No bounds checking is performed."
However the docs for at() say
If pos not within the range of the container, an exception of type std::out_of_range is thrown
Going out of bounds on a container is undefined behavior. If you are accessing with an index where you aren't sure if it's in-bounds or not, it's your job to either check that it is, or use at and possibly handle the exception.
Indexing out of bounds gives undefined behavior, so anything can happen.
Many containers will round the current size up to some convenient value (e.g., a power of 2), so depending on the current size you'll have some amount of memory after the last item in the collection. Indexing into that memory and attempting to read it will produce some result, but the memory is typically uninitialized, so the result will often be meaningless and invalid (and, although most don't, the container could do bounds checking, and throw an exception or almost anything else when you index out of bounds).
IMO, at is a fairly poor tool to deal with the possibility though. A better way to avoid such problems is a range-based for loop:
for (auto &d : x) {
d = d * d;
std::cout << d << "\n"; // avoid `endl`, which flushes the stream.
}
Another possibility would be to use standard algorithms:
std::transform(x.begin(), x.end(), x.begin(), [](double d) { return d*d; });
std::copy(x.begin(), x.end(), std::ostream_iterator<double>(std::cout, "\n"));
There are also range-based algorithms (e.g., one set in Boost, at least one more being suggested for a future C++ standard), that (do/would) allow something on the general order of:
copy(x, output_range<double>(std::cout, "\n"));
Since this figures out the bounds of x on its own, short of a bug in the code for the range, it's pretty much impossible to accidentally index out of bounds this way.

should std::copy() or std::move() of empty range require valid destination?

The std::move() in the code below issues a runtime warning when compiled in Visual Studio 2013 (with Debug configuration) because it detects that dest is a nullptr. However, the source range is empty, so dest should never be accessed.
The C++ standard may be unclear as to whether this should be allowed?
It states: Requires: result shall not be in the range [first,last).
A nullptr would seem to satisfy that requirement.
#include <vector>
#include <algorithm>
int main() {
std::vector<int> vec;
int* dest = nullptr;
// The range [begin(vec),end(vec)) is empty, so dest should never be accessed.
// However, it results in an assertion warning in VS2013.
std::move(std::begin(vec), std::end(vec), dest);
}
Not only does the Requires: clause need to be satisfied, but everything in the Effects: and Returns: clause needs to be satisfied as well. Let's go through them:
Effects: Copies elements in the range [first,last) into the range [result,result + (last - first)) starting from first and
proceeding to last.
As first == last, then the range [result, result + 0) must be a valid range.
[iterator.requirements.general]/p7 states:
A range [i,i) is an empty range; ... Range [i,j) is valid if and only if j is reachable from i.
And p6 of the same section states:
An iterator j is called reachable from an iterator i if and only
if there is a finite sequence of applications of the expression ++i
that makes i == j.
From these paragraphs I conclude that given:
int* dest = nullptr;
Then [dest, dest) forms a valid empty range. So the first sentence in the Effects: paragraph looks ok to me:
For each non-negative integer n < (last - first), performs *(result + n) = *(first + n).
There are no non-negative integers n < 0, and so no assignments can be performed. So the second sentence does not prohibit dest == nullptr.
Returns: result + (last - first).
[expr.add]/p8 specifically allows one to add 0 to any pointer value and the result compares equal to the original pointer value. Therefore dest + 0 is a valid expression equal to nullptr. No problems with the Returns: clause.
Requires: result shall not be in the range [first,last).
I see no reasonable way to interpret that dest would be "in" an empty range.
Complexity: Exactly last - first assignments.
This confirms that no assignments can be done.
I can find no statement in the standard that makes this example anything but well-formed.
The debug build of STL in Visual Studio does additional parameter validation. In this case, it is validating that dest is not null since it shouldn't be, which is failing. The release build may very well act as you expect, never using dest but that doesn't make the input data valid.
The debug build of STL is trying to help you by saying, "your input is bad". While the bad input might not be a problem in some circumstances, the validator can't know under what conditions you are passing it bad data. Personally, I would much rather have VS tell me about bad input in debug builds than have a runtime exception thrown in production.
Sure, you might do something like this:
int* dest = nullptr;
if (vec.size() > 0) dest = realDest;
std::move(std::begin(vec), std::end(vec), dest);
But the validator doesn't know that so it assumes the worst, especially since fixing it is very easy for you (just always pass in a valid output iterator) and not warning you about it could have terrible consequences on your application at runtime in production.
This answer is derived from the comment by #Philipp Lenk. If he supplies an answer and you find it acceptable, choose his over mine and please up-vote his original comment.
§25.1.5: Throughout this Clause, the names of template parameters are
used to express type requirement [...] if an algorithm’s template
parameter is OutputIterator, OutputIterator1, or OutputIterator2, the
actual template argument shall satisfy the requirements of an output
iterator.
§24.2.4: A class or pointer type X satisfies the requirements of an
output iterator if X satisfies the Iterator requirements and the
expressions in Table 108 are valid and have the indicated semantics.
First line of the table: *r = o with the remark that post: r is
incrementable.
int* dest = nullptr;
*dest = 5;
++dest;
The above code is not valid. You cannot assign to *dest in this case, therefore per the standard you cannot pass a nullptr in for the dest.

Modifying contents of vector in BOOST_FOREACH

This is a question that goes to how BOOST_FOREACH checks it's loop termination
cout << "Testing BOOST_FOREACH" << endl;
vector<int> numbers; numbers.reserve(8);
numbers.push_back(1); numbers.push_back(2); numbers.push_back(3);
cout << "capacity = " << numbers.capacity() << endl;
BOOST_FOREACH(int elem, numbers)
{
cout << elem << endl;
if (elem == 2) numbers.push_back(4);
}
cout << "capacity = " << numbers.capacity() << endl;
gives the output
Testing BOOST_FOREACH
capacity = 8
1
2
3
capacity = 8
But what about the number 4 which was inserted half way through the loop? If I change the type to a list the newly inserted number will be iterated over. The vector push_back operation will invalidate any pointers IF a reallocation is required, however that is not happening in this example. So the question I guess is why does the end() iterator appear to only be evaluated once (before the loop) when using vector but has a more dynamic evaluation when using a list?
Under the covers, BOOST_FOREACH uses
iterators to traverse the element
sequence. Before the loop is executed,
the end iterator is cached in a local
variable. This is called hoisting, and
it is an important optimization. It
assumes, however, that the end
iterator of the sequence is stable. It
usually is, but if we modify the
sequence by adding or removing
elements while we are iterating over
it, we may end up hoisting ourselves
on our own petard.
http://www.boost.org/doc/libs/1_40_0/doc/html/foreach/pitfalls.html
If you don't want the end() iterator to change use resize on the vector rather than reserve.
http://www.cplusplus.com/reference/stl/vector/resize/
Note that then you wouldn't want to push_back but use the operator[] instead. But be careful of going out of bounds.
The question was raised in the comments as to why the Microsoft debug runtime raises an assertion during iteration over the vector but not over the list. The reason is that insert is defined differently for list and vector (note that push_back is just an insert at the end of the sequence).
Per the C++ standard (ISO/IEC 14882:2003 23.2.4.3, vector modifiers):
[on insertion], if no reallocation happens, all the iterators and references before the insertion point remain valid.
(23.2.2.3, list modifiers):
[insert] does not affect the validity of iterators and references.
So, if you use push_back (and are sure that it's not going to cause a reallocation), it's okay with either container to continue using your iterator to iterate over the rest of the sequence.
In the case of the vector, however, it's undefined behavior to use the end iterator that you obtained before the push_back.
This is a roundabout answer to the question; it's a direct answer to the discussion in the question's comments.
boost's foreach will terminate when it's iterator == numbers.end()
Be careful though, calling push_back can/will invalidate any current iterators you have.