How smart is C++ deque iterator - c++

Say I have a std::deque<int> dcontaining 100 values, from 0 to 99. Given the following:
Unlike vectors, deques are not guaranteed to store all its elements in
contiguous storage locations: accessing elements in a deque by
offsetting a pointer to another element causes undefined behavior.
It appears line below is not valid:
int invalidResult = *(d.begin() + 81); // might give me 81, but NOT GUARANTEED, right?
My question is this: does an iterator take care of this?
std::deque<int>::iterator it = d.begin();
int isThisValid = *(it + 81); // 81 every time? or does it result in undefined behavior?
At one point, I had thought that the iterator would handle any discontinuities in the underlying storage, but now I'm not so sure. Obviously, if you use it++ 81 times, *it will give you 81 as a result.
Can someone say for sure?
For what it's worth, I am not using C++11.

It appears line below is not valid:
int invalidResult = *(d.begin() + 81); // might give me 81, but NOT GUARANTEED, right?
On the contrary. The statement is perfectly valid and the behaviour is guaranteed (assuming d.size() >= 82). This is because std::deque::begin returns an iterator, not a pointer, so the quoted rule does not apply.
std::deque<int>::iterator it = d.begin();
int isThisValid = *(it + 81); // 81 every time? or does it result in undefined behavior?
This is pretty much equivalent to the previous code, except you've used a named variable, instead of a temporary iterator. The behaviour is exactly the same.
Here is an example of what you may not do:
int* pointer = &d.front();
pointer[offset] = 42; // oops

According to this reference here a std::deque provides a RandomAccessIterator which will certainly work according to your example.
std::deque<int>::iterator it = d.begin();
int isThisValid = *(it + 81); // will be fine assuming the deque is that large

Related

Variables inside and outside a loop with/without asterix

I am not proficient in C++ but I am converting a short script to PHP
for(auto it = First; it != Last; ++it)
{
Result += *it;
}
From this snippet, I can speculate this simply means
Result = Result + it
where * is a reference to the pointer of the loop.
That said I see this symbol used outside of loops and in some cases I see variables without this symbol both in and outside of loops which puts holes in my theory.
Again I am trying to RTFM but I am unsure what I am searching for.
Both First and Last are iterator objects, representing a generalization of pointers in C++ Standard Library. Additionally, the two iterators reference the same collection, and Last can be reached from First by incrementing the iterator*.
Result is some sort of accumulator. If it is of numeric type, += means Result = Result + *it, where *it is whatever the iterator is pointing to. In other words, Result accumulates the total of elements of the collection between First, inclusive, and Last, exclusive. If First points to the beginning of an array and Last points to one-past-the-end of an array of numeric type, your code would be equivalent to calling PHP array_sum() on the array.
However, Result is not required to be numeric. For example, it could be a std::string, in which case += represents appending the value to the string.
* In terms of pointers and arrays this would be "pointing to the same array," and "Last points to a higher index of the array than First."
I believe your speculation is incorrect.
it, first and last are either iterators or pointers. Iterators are C++ objects that can be used to iterator over containers. For basic usage, they behave much like pointers, and can be dereferenced the same way.
For example:
std::vector<int> myList;
...
// Search for the number 10 in the list.
std::vector<int>::iterator it = std::find(myList.begin(), myList.end(), 10);
// If the number 10 was found in the list, change the value to 11.
if (it != myList.end())
*it = 11; //< Similar to pointer syntax.
In your specific example, the Result variable has a value added to it. To get that value, your code uses the * operator to get the value from the iterator.
The same concept applies to pointers. although iterators and pointers are very different concepts, accessing their values is very similar.

Iterator returned by set_union()

I have the following C++ code using set_union() from algorithm stl:
9 int first[] = {5, 10, 15, 20, 25};
10 int second[] = {50, 40, 30, 20, 10};
11 vector<int> v(10);
12 vector<int>::iterator it;
13
14 sort(first, first+5);
15 sort(second, second+5);
16
17 it = set_union(first, first + 5, second, second + 5, v.begin());
18
19 cout << int(it - v.begin()) << endl;
I read through the document of set_union from http://www.cplusplus.com/reference/algorithm/set_union/ . I have two questions:
Line 17. I understand set_union() is returning an OutputIterator. I
thought iterators are like an object returned from a container object
(e.g. instantiated vector class, and calling blah.begin()
returns the iterator object). I am trying to understand what does
the "it" returned from set_union point to, which object?
Line 19. What does "it - v.begin()" equate to. I am guessing from the output value of "8", the size of union, but how?
Would really appreciate if someone can shed some light.
Thank you,
Ahmed.
The documentation for set_union states that the returned iterator points past the end of constructed range, in your case to one past the last element in v that was written to by set_union.
This is the reason it - v.begin() results in the length of the set union also. Note that you are able to simply subtract the two only because a vector<T>::iterator must satisfy the RandomAccessIterator concept. Ideally, you should use std::distance to figure out the interval between two iterators.
Your code snippet can be written more idiomatically as follows:
int first[] = {5, 10, 15, 20, 25};
int second[] = {50, 40, 30, 20, 10};
std::vector<int> v;
v.reserve(10); // reserve instead of setting an initial size
sort(std::begin(first), std::end(first));
sort(std::begin(second), std::begin(second));
// use std::begin/end instead of hard coding length
auto it = set_union(std::begin(first), std::end(first),
std::begin(second), std::end(second),
std::back_inserter(v));
// using back_inserter ensures the code works even if the vector is not
// initially set to the right size
std::cout << std::distance(v.begin(), it) << std::endl;
std::cout << v.size() << std::endl;
// these lines will output the same result unlike your example
In response to your comment below
What is the use of creating a vector of size 10 or reserving size 10
In your original example, creating a vector having initial size of at least 8 is necessary to prevent undefined behavior because set_union is going to write 8 elements to the output range. The purpose of reserving 10 elements is an optimization to prevent possibility of multiple reallocations of the vector. This is typically not needed, or feasible since you won't know the size of the result in advance.
I tried with size 1, works fine
Size of 1 definitely does NOT work fine with your code, it is undefined behavior. set_union will write past the end of the vector. You get a seg fault with size 0 for the same reason. There's no point in speculating why the same thing doesn't happen in the first case, that's just the nature of undefined behavior.
Does set_union trim the size of the vector, from 10 to 8. Why or is that how set_union() works
You're only passing an iterator to set_union, it knows nothing about the underlying container. So there's no way it could possibly trim excess elements, or make room for more if needed. It simply keeps writing to the output iterator and increments the iterator after each write. This is why I suggested using back_inserter, that is an iterator adaptor that will call vector::push_back() whenever the iterator is written to. This guarantees that set_union will never write beyond the bounds of the vector.
first: "it" is an iterator to the end of the constructed range (i.e. equivalent to v.end())
second: it - v.begin() equals 8 because vector iterators are usually just typedefed pointers and therefore it is just doing pointer arithmetic. In general, it is better to use the distance algorithm than relying on raw subtraction
cout << distance(v.begin(), it) << endl;

why std::for_each with deletion of elements not break iteration?

As far as I know, erasing elements during iteration of a collection should break the iteration or cause you to skip elements. Why does calling std::for_each with a predicate which erases not cause this to happen? (It works).
Code snip:
#include <iostream>
#include <map>
#include <algorithm>
using namespace std;
int main() {
map<int,long long> m;
m[1] = 5000;
m[2] = 1;
m[3] = 2;
m[4] = 5000;
m[5] = 5000;
m[6] = 3;
// Erase all elements > 1000
std::for_each(m.begin(), m.end(), [&](const decltype(m)::value_type& v){
if (v.second > 1000) {
m.erase(v.first);
}
});
for (const auto& a: m) {
cout << a.second << endl;
}
return 0;
}
it prints out
1
2
3
EDIT: I now see that if it actually increments the iterator before calling the function then it could work. But does this count as compiler specific/undefined behavior?
It's undefined behaviour, and won't work reliably. After adding a line to print keys and values inside your erasing lambda function, I see:
1=5000
2=1
3=2
4=5000
2=1 // AGAIN!!!
3=2 // AGAIN!!!
5=5000
6=3
With my Standard library's map implementation, after erasing the element with key 4, iteration returns to the node with key 2! It then revisits the node with key 3. Because your lambda happily retested such nodes (v.second > 1000) and returned without any side effects, the broken iteration wasn't affecting the output.
You might reasonably ask: "but isn't it astronomically unlikely that it'd have managed to continue iteration (even if to the wrong next node) without crashing?"
Actually, it's quite likely.
Erasing a node causes delete to be called for the memory that node occupied, but in general the library code performing the delete will just:
invoke the destructor (which has no particular reason to waste time overwriting the left-child-, right-child- and parent-pointers), then
modify its records of which memory regions are allocated vs. available.
It's unlikely to "waste" time arbitrarily modifying the heap memory being deallocated (though some implementations will in memory-usage debugging modes).
So, the erased node probably sits there intact until some other heap allocation's performed.
And, when you erase an element in a map, the Standard requires that none of the container's other elements be moved in memory - iterators, pointers and references to other elements must remain valid. It can only modify nodes' left/right/parent pointers that maintain the binary tree.
Consequently, if you continue to use the iterator to the erased element, it is likely to see pointers to the left/right/parent nodes the erased element linked to before erasure, and operator++() will "iterate" to them using the same logic it would have employed if the erased element were still in the map.
If we consider an example map's internal binary tree, where N3 is a node with key 3:
N5
/ \
N3 N7
/ \ /
N1 N4 N6
The way iteration is done will likely be:
initially, start at the N1; the map must directly track where this is to ensure begin() is O(1)
if on a node with no children, repeat { Nfrom = where you are, move to parent, if nullptr or right != Nfrom break} (e.g. N1->N3, N4->N3->N5, N6->N7->N5->nullptr)
if on a node with right-hand child, take it then any number of left-hand links (e.g. N3->N4, N5->N7->N6)
So, if say N4 is removed (so N3->right = nullptr;) and no rebalancing occurs, then iteration records NFrom=N4 then moves to the parent N3, then N3->right != Nfrom, so it will think it should stop on the already-iterated-over N3 instead of moving on up to N5.
On the other hand, if the tree has been rebalanced after the erase, all bets are off and the invalidated iterator could repeat or skip elements or even iterate "as hoped".
This is not intended to let you reason about behaviour after an erase - it's undefined and shouldn't be relied on. Rather, I'm just showing that a sane implementation can account for your unexpected observations.
std::for_each is defined in the C++ draft standard (25.2.4) as a non-modifying sequence operation. The fact that modifying the sequence using your implementation of the function works is probably just luck.
This is definitely implementation-defined, and you shouldn't be doing it. The standard expects that you don't modify the container inside the function object.
This happened to work for you, but I wouldn't count on it -- it's likely undefined behavior.
Specifically, I'd be concerned that erasing a map element while running std::for_each would attempt to increment an invalid iterator. For example, it looks like libc++ implements std::for_each like so:
template<typename _InputIterator, typename _Function>
_Function
for_each(_InputIterator __first, _InputIterator __last, _Function __f) {
// concept requirements
__glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
__glibcxx_requires_valid_range(__first, __last);
for (; __first != __last; ++__first)
__f(*__first);
return _GLIBCXX_MOVE(__f);
}
If calling __f ends up performing an erase, it seems likely that __first would be invalidated. Attempting to subsequently increment an invalid iterator would then be undefined behavior.
I find this operation pretty common, so to avoid the undefined behaviour above, I wrote a container based algorithm.
void remove_erase_if( Container&&, Test&& );
to deal with both associative and not containers, I tag dispatch on a custom trait class is_associative_container -- the associative goes to a manual while loop, while the others go to a remove_if-erase version.
In my case I just hard code the 4 associative containers in the trait -- you could duck type it, bit it is a higher level concept, so you would be just pattern matching anyhow.

Appending a std::vector with its own elements using iterators

The following code works as expected (the test passes) but I wonder if working with iterators in this way is considered a bad practice in c++ or if it is okay.
Maybe this is specific for std::vector and other collections behave differently and best practices vary between collections (or even their implementations)?
It certainly is not okay in other languages and most of the time changing a collection will invalidate iterators and throw exceptions.
BOOST_AUTO_TEST_CASE (ReverseIteratorExample) {
std::vector<int> myvector;
for(int i = 0; i < 5; i++)
{
myvector.push_back(i);
}
// is this generally a bad idea to change the vector while iterating?
// is it okay in this specific case?
myvector.reserve(myvector.size() + myvector.size() - 2 );
myvector.insert(myvector.end(), myvector.rbegin() + 1, myvector.rend() -1);
int resultset [8] = { 0,1,2,3,4,3,2,1 };
std::vector<int> resultVector( resultset, resultset + sizeof(resultset)/sizeof(resultset[0]) );
BOOST_CHECK_EQUAL_COLLECTIONS(myvector.begin(), myvector.end(), resultVector.begin(), resultVector.end());
}
Summarized Questions:
Is this generally a bad idea to change the vector while iterating?
Is it okay in this specific case?
Is this specific for std::vector and other collections behave differently?
Do best practices vary between collections (or even their implementations)?
This is not valid code. The standard's definition of operations on sequence containers states (23.2.3#4):
a.insert(p,i,j) - [...] pre: i and j are not iterators into a.
So your code invokes undefined behavior because it violates the precondition for the insert operation.
If instead of using insert, you wrote a loop iterating from myvector.rbegin() + 1 to myvector.rend() -1 and called push_back on all values, your code would be valid: This is because push_back only invalidates vector iterators if a reallocation is needed, and your call to reserve ensures that this is not the case.
In general, while there are some cases where modifying a container while iterating over it is fine (such as the loop described above), you have to make sure that your iterators aren't invalidated while doing so. When this happens is specific to each container.

C++ vector<T>::iterator operator +

Im holding an iterator that points to an element of a vector, and I would like to compare it to the next element of the vector.
Here is what I have
Class Point{
public:
float x,y;
}
//Somewhere in my code I do this
vector<Point> points = line.getPoints();
foo (points.begin(),points.end());
where foo is:
void foo (Vector<Point>::iterator begin,Vector<Point>::iterator end)
{
std::Vector<Point>::iterator current = begin;
for(;current!=end-1;++current)
{
std::Vector<Point>::iterator next = current + 1;
//Compare between current and next.
}
}
I thought that this would work, but current + 1 is not giving me the next element of the vector.
I though operator+ was the way to go, but doesnt seem so. Is there a workaround on this?
THanks
current + 1 is valid for random access iterators (which include vector iterators), and it is the iterator after current (i.e., what you think it does). Check (or post!) your comparison code, you're probably doing something wrong in there.
std::vector has random-access iterators. That means they are, basically, as versatile as pointers. They provide full-blown pointer arithmetic (it+5, it+=2) and comparisons other than !=/== (i.e., <, <=, >, and >=).
Comparison between iterators in your code should certainly work, but would be nonsensical:
for(std::vector<Point>::iterator current = begin;current!=end-1;++current)
{
std::vector<Point>::iterator next = current + 1;
assert(current!=next); // always true
assert(current<next); // also always true
}
So if it doesn't work for you, it's likely you do something wrong. Unfortunately, "...is not giving me the next element of the vector..." doesn't give us no clue what you are trying, so it's hard to guess what you might be doing wrong.
Maybe it's just a typo, but your code is referring to Vector while the standard container is vector (lower case V).
But if that's not a typo in your question, without seeing the definition of Vector, there's no way to tell what that does.