I am reading effective STL by Scott Meyers. Here in item 1 author is mentioning about how to choose among various containers and below is text snippet which I am having difficulty in understanding.
Would it be helpful to have a sequence container with random access
iterators where pointers and references to the data are not
invalidated as long as nothing is erased and insertions take place
only at the ends of the container? This is a very special case, but if
it’s your case, deque is the container of your dreams.
(Interestingly,deque’s iterators may be invalidated when insertions
are made only at the ends of the container. deque is the only standard
STL container whose iterators may be invalidated without also
invalidating its pointers and references.)
My questions on above text
What does author mean by pointers and references in above context and how is it different from iterators?
How deque's iterators may be invalidated when insertion made only at end and still we have valid pointers and references?
Request above two questions to be answered with simple example.
Thanks for your time and help.
For the first part, what's meant is this:
deque<int> foo(10, 1); // a deque with ten elements with value of 1
int& bar = foo.front(); // reference
int* baz = &foo.front(); // pointer
deque<int>::iterator buz = foo.begin(); // iterator
deque.push_front(0);
// At this point bar and baz are still valid, but buz may have been invalidated
For the second part it's been covered in the detail here:
Why does push_back or push_front invalidate a deque's iterators?
An iterator is often used to "cycle through" the elements of a standard-library container, much like you would do with an array index, e.g. in a for loop.
Iterators can be invalid for many reasons. One common case where this happens is when you use a for loop such as the following:
std::deque<int> c;
for(std::deque<int>::iterator i = c.begin(); i != c.end(); ++i) {
// do some stuff to the deque's elements here
}
At the end of the above loop, the iterator i will point to an "element" one block after the last real element in the deque. If you tried to do something like
*i = 88;
right after the end of the above for loop that would be a problem because the container does not "own" the memory i "points" to.
But what Meyers is likely talking about is that the Standard leaves much of the implementation of a deque open to the designer. Deques are usually implemented as linked-lists of blocks of memory holding several elements, so unlike vectors there is no guarantee that elements will be next to each other in memory. Furthermore, iterators necessarily contain information about these "blocks" so that they can traverse them smoothly (i.e. iterators are not simply pointers).
For example, if I push_back() a new element, but there is no more room in the "last" chunk of memory, then deque will need to allocate a new block of memory for the new element (and future elements added to the end). Since an iterator I was using previously might not "know" about this new chunk of memory, it could be invalid.
References and actual pointers, on the other hand, would be used in this context to refer/point to individual objects in the container. If I write
int& j = *c.begin();
then j is a reference to the first element of c. If I then do
c.push_front(74);
j still references that previous first element, even though it is no longer at the front of the deque.
However, if you insert something in the middle of the deque, then chances are you are effectively splitting one of those contiguous chunks of memory and trying to squeeze your new element in there. To make room, elements on one side or the other must be shuffled around in memory (and possibly new memory needs to be allocated). This would by necessity invalidate pointers/references to elements on that "side" of the insertion. Since it is up to the implementer how exactly room is made for an inserted element, all bets are off with respect to any pointer/reference, no matter where it is with respect to the insertion.
Related
I have read The C++standard Library A Tutorial and reference 2nd, it said that deque's implementation include many blocks, I was curious that if i insert a element in the middle of the deque, will all the elements after the new inserted elements be moved backward just like vector,Or it will only move the elements in the inserted block?
As Igor said, the standard doesn't mention such details. However, given that it does say that all pointers, iterators and references are invalidated, I think you can assume that it moves more than the elements in a single "block".
As an aside, given the iterator requirements for deque, all the blocks (except the first and the last one) have to be kept full. Random access iterators require constant time "increment by N", and that can't be done if you have to count how many items are in each block (or, at least, I don't see a way to do that). So that would imply that all the elements either before or after the insertion point have to be moved. (again, not just the ones in the same "block")
I am having some difficulty grasping this concept. From this thread here it states
A deque requires that any insertion to the front or back shall keep
any reference to a member element valid. It's OK for iterators to be
invalidated, but the members themselves must stay in the same place in
memory.
I was under the impression from this thread which states
A pointer is actually a type of iterator. In fact, for some container types, the corresponding iterator can be
implemented simply as a pointer.
If we have a pointer and an iterator that each reference the same
element of a container, then any operation that invalidates one will
invalidate the other.
so if an iterator becomes invalidated then references also become invalidated.
My question is how is that possible. If the iterator which points to a certain memory address becomes invalidated how can a reference to that address be valid ?
Update:
I understand that a deque is implemented by random chunks of memory and these chunks of memory are tracked by an independant data structure such as a dynamic array. However i am having difficulty understanding how an iterator could be invalid but a reference could be valid since essentially an iterator is a generalized pointer for the contents of the data structure. This makes me think that an iterator might be pointing to something else while a pointer points to the actual item ? Consider the following diagram of a vector .
From what i understand in the diagram above for a vector its that if content of a pointer changes the iterator also changes. How is that different for a deque .
Think of a deque in terms of the following:
template<typename T>
struct deque_stub {
using Page = std::array<T, 32>; // Note: Not really, rather uninitialised memory of some size;
std::vector<std::unique_ptr<Page>> pointers_to_pages;
std::size_t end_insert{32};
std::size_t start_elem{0};
// read further
};
A deque is basically some container, storing pointers to pages which contain some elements. (The start_elem and end_insert members are to keep track of where, in terms of offset into a page, the valid range of elements starts and ends.)
Insertion eventually changes this container, when a new page is needed:
template<typename X>
void push_back(X&& element) {
if (end_insert == 32) {
// get a new page at the end
pointers_to_pages.push_back(make_unique<Page>());
end_insert = 0;
}
(*(pointers_to_pages.back()))[end_insert] = std::forward<X>(element);
++end_insert;
}
template<typename X>
void push_front(X&& element) {
if (start_elem == 0) {
pointers_to_pages.insert(
pointers_to_pages.begin(), std::make_unique<Page>());
start_elem = 32;
}
--start_elem;
(*(pointers_to_pages.front()))[start_elem] = std::forward<X>(element);
}
An iterator into that deque needs to be able to "jump" across pages. The easiest way to achieve this is by having it keep an iterator to the current page it is in from the container pointers_to_pages:
struct iterator {
std::size_t pos;
std::vector<std::unique_ptr<Page>>::iterator page;
// other members to detect page boundaries etc.
};
But since that page iterator, the iterator into the vector, may get invalidated when the vector gets changed (which happens when a new page is needed), the whole iterator into the deque might get invalidated upon insertion of elements. (This could be "fixed" by not using a vector as container for the pointers, though this would probably have other negative side effects.)
As an example, consider a deque with a single, but full page. The vector holding the pointers to pages thus holds only a single element, let's say at address 0x10, and let's further assume that its current capacity is also only 1 element. The page itself is stored at some address, let's say 0x100.
Thus the first element of the deque is actually stored at 0x100, but using the iterator into the deque means first looking at 0x10 for the address of the page.
Now if we add another element at the end, we need a new page to store that. So we allocate one, and store the pointer to that new page into the vector. Since its capacity is less than the new size (1 < 2), it needs to allocate a new larger area of memory and move its current contents there. Let's say, that new area is at 0x20. The memory where the pointers have been stored previously (0x10) is freed.
Now the very same element from above before the insertion is still at the same address (0x100), but an iterator to it would go via 0x20. The iterator from above, accessing 0x10, is thus invalid.
Since the element is at the same address, pointers and references to it remain valid, tough.
Because the answer you cite is wrong, and because iterators are a lot more than just pointers. For a start, a linked list iterator needs a pointer to the element but also "next" and "previous" pointers. Right there, with that simple example, your notion that "an iterator is a generalized pointer for the contents of the data structure" is completely blown out of the water.
A deque is more complicated than a totally contiguous structure (e.g. vector) and more complicated than a totally non-contiguous structure (i.e. list). When a deque grows, its overall structure moulds to fit, with a minimum of reallocations of the actual elements (often, none).
The result is that even when certain elements don't move, the "control pieces" that allow access to them may need to be updated with fresh metadata about, for example, where neighbouring elements (which maybe did move) now are.
Now, a deque cannot magically update iterators that have already been instantiated somewhere: all it can do is document that your old iterators are invalid and that you shall obtain new ones in the usual way.
class LargeClass
{}
void FunctionA(const LargeClass&) {}
std::vector<LargeClass> vecLargeClass; // populate vecLargeClass
const LargeClass* prev = vecLargeClass[0];
for( ... )
{
...
if(...)
prev = &vecLargeClass[i];
}
I need to keep a reference to an element stored inside a vector.
In order to avoid copy, I currently use a raw pointer. Or I can store an index pointing to the element.
Is there a better solution for this?
Yes, you can keep a "reference" to an element in a vector so long as that vector's iterators aren't invalidated. That is a big caveat.
A vector's iterators become invalidated when the vector is reallocated, which can happen any time you add elements to the vector. Additionally when you erase an item from a vector, all the iterators at and beyond the point of removal are invalidated.
This is all very complicated, and better not worried about. If you need iterators to never become invalidated (so long as you don't remove that item itself), a vector might not be the best collection for your use. Instead, you might consider a list, a map, or other collections. Note that each has its own set of tradeoffs.
You might not need to care about the iterators at all, however. If your vector stored not items themselves, but pointers to the items, then even if the vector is reallocated the things the pointers point to will not move. Going this route, of course you should use a smart pointer if possible. On the face of it, the best one would appear to be shared_ptr. So your delcaration becomes:
std::vector<shared_ptr<LargeClass>>
Finally, if you really need to use a vector and don't want to mess with smart pointers, you might do well to not keep track of "references" to the items in the vector, but their index positions. Suppose you want to keep track of the item at vecLargeClass[3]. Even if you do something to invalidate iterators, the item in question will still be at index 3. Instead of keeping track of interators or pointers to things, keep track of where they are in the vector.
Be careful when storing a pointer or a reference to a vector element. There are certain operations that can invalidate those references, such as push_back, resize, etc. If the index is what you're sure will not change, then it would be the safest. Smart pointers, as marcin_j mentioned in the comment, will not help with the invalidation in case of push_back, resize, etc.
So I'm going through Accelerated C++ and am somewhat unsure about iterator invalidation in C++. Maybe it's the fact that it is never explained how these iterators are constructed is the problem.
Here is one example:
Vector with {1,2,3}
If my iterator is on {2} and I call an erase on {2} my iterator is invalid. Why? In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element. The only way I would see this as being not true is if iterators were made before hand for each element and each iterator had some type of field containing the address of the following element in that container.
My other question has to do with the statement such as "invalidates all other iterators". Erm, when I loop through my vector container, I am using one iterator. Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element.
That may be the case. But it’s equally valid that the whole vector is relocated in memory, thus making all iterators point to now-defunct memory locations. C++ simply makes no guarantees either way. (See comments for discussion.)
Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
You’re merely missing the fact that you may have other iterators referencing the same vector besides your loop variable. For example, the following loop is an idiomatic style that caches the end iterator of the vector to avoid redundant calls:
vector<int> vec;
// …
for (vector<int>::iterator i(vec.begin()), end(vec.end()); i != end; ++i) {
if (some_condition)
vec.erase(i); // invalidates `i` and `end`.
}
(Nevermind the fact that this copy of the end iterator is in fact unnecessary with the STL on modern compilers.)
The following C++ defect report (fixed in C++0x) contains a brief discussion of the meaning of "invalidate":
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#414
int A[8] = { 1,3,5,7,9,8,4,2 };
std::vector<int> v(A, A+8);
std::vector<int>::iterator i1 = v.begin() + 3;
std::vector<int>::iterator i2 = v.begin() + 4;
v.erase(i1);
Which iterators are invalidated by
v.erase(i1): i1, i2, both, or neither?
On all existing implementations that I
know of, the status of i1 and i2 is
the same: both of them will be
iterators that point to some elements
of the vector (albeit not the same
elements they did before). You won't
get a crash if you use them. Depending
on exactly what you mean by
"invalidate", you might say that
neither one has been invalidated
because they still point to something,
or you might say that both have been
invalidated because in both cases the
elements they point to have been
changed out from under the iterator.
It seems that the specification is "playing safe" regarding iterator and reference invalidation. It says that they're invalidated even though, as you and Matt Austern both noted, there's still a vector element at the same address. It just has a different value.
So, those of us following the standard must program as if that iterator can't be used any more, even though no implementation is likely to do anything that would actually stop them working, except perhaps a debugging iterator that could do extra work to let us know we're off-road.
In fact that defect report relates to exactly the case you're talking about. As far as the C++03 standard actually says, at least in that clause, your iterator isn't invalidated. But that was considered an error.
An iterator basically wraps a pointer. Some operations on containers have the effect of reallocating some or all of the data behind the scenes. In that case, all current pointers/iterators are left pointing to the wrong memory locations.
The image "in your mind" is an implementation detail, and it could be that your iterator isn't implemented that way. Likely it is, but it could be that it isn't.
The "ivalidates all other iterators" language is their way of saying that the implemenation is allowed the freedom to do anything its coders' skeevie hearts feel like to the contaier when you perform that operation, including things that require internal changes to iterators. Since the only iterator it has access to is the one you passed in, that's the only one that it can fix up if need be.
If you want the behavior in your head for a vector, it is easy to get. Just use an index into the vector instead of an iterator. Then it works just like you think.
Chances are that your iterator is actually pointing at the 3 -- but it's not certain.
The general idea is to allow vector to allocate new storage and move your data from one block of storage to another when/if it sees fit to do so. As such, when you insert or delete data, the data might move to some other part of memory entirely.
At least that was sort of the intent. It turns out that other rules probably prevent it from moving the data when you delete -- but the iterator is invalidated anyway, probably because somebody didn't quite understand all the implications of those other rules when this one was made.
From SGI http://www.sgi.com/tech/stl/Vector.html
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
So you can erase starting from end
int i;
vector v;
for ( i = v.size(), i >=0, i--)
{
if (v[i])
v.erase(v.begin() + i);
}
OR use iterator returned from vector erase()
std::vector<int> v;
for (std::vector<int>::iterator it = v.begin(); it != v.end(); )
it = v.erase(it);
As the title asks.
My understanding of a deque was that it allocated "blocks". I don't see how allocating more space invalidates iterators, and if anything, one would think that a deque's iterators would have more guarantees than a vector's, not less.
The C++ standard doesn't specify how deque is implemented. It isn't required to allocate new space by allocating a new chunk and chaining it on to the previous ones, all that's required is that insertion at each end be amortized constant time.
So, while it's easy to see how to implement deque such that it gives the guarantee you want[*], that's not the only way to do it.
[*] Iterators have a reference to an element, plus a reference to the block it's in so that they can continue forward/back off the ends of the block when they reach them. Plus I suppose a reference to the deque itself, so that operator+ can be constant-time as expected for random-access iterators -- following a chain of links from block to block isn't good enough.
What's more interesting is that push_back and push_front will not invalidate any references to a deque's elements. Only iterators are to be assumed invalid.
The standard, to my knowledge, doesn't state why. However if an iterator were implemented that was aware of its immediate neighbors - as a list is - that iterator would become invalid if it pointed to an element that was both at the edge of the deque and the edge of a block.
My guess. push_back/push_front can allocate a new memory block. A deque iterator must know when increment/decrement operator should jump into the next block. The implementation may store that information in iterator itself. Incrementing/decrementing an old iterator after push_back/push_front may not work as intended.
This code may or may not fail with run time error. On my Visual Studio it failed in debug mode but run to the conclusion in release mode. On Linux it caused segmentation fault.
#include <iostream>
#include <deque>
int main() {
std::deque<int> x(1), y(1);
std::deque<int>::iterator iterx = x.begin();
std::deque<int>::iterator itery = y.begin();
for (int i=1; i<1000000; ++i) {
x.push_back(i);
y.push_back(i);
++iterx;
++itery;
if(*iterx != *itery) {
std::cout << "increment failed at " << i << '\n';
break;
}
}
}
The key thing is not to make any assumptions just treat the iterator as if it will be invalidated.
Even if it works fine now, a later version of the compiler or the compiler for a different platform might come along and break your code. Alternatively, a colleague might come along and decide to turn your deque into a vector or linked list.
An iterator is not just a reference to the data. It must know how to increment, etc.
In order to support random access, implementations will have a dynamic array of pointers to the chunks. The deque iterator will point into this dynamic array. When the deque grows, a new chunk might need to be allocated. The dynamic array will grow, invalidating its iterators and, consequently, the deque's iterators.
So it is not that chunks are reallocated, but the array of pointers to these chunks can be. Indeed, as Johannes Schaub noted, references are not invalidated.
Also note that the deque's iterator guarantees are not less than the vector's, which are also invalidated when the container grows.
Even when you are allocating in chunks, an insert will cause that particular chunk to be reallocated if there isn't enough space (as is the case with vectors).
Because the standard says it can. It does not mandate that deque be implemented as a list of chunks. It mandates a particular interface with particular pre and post conditions and particular algorithmic complexity minimums.
Implementors are free to implement the thing in whatever way they choose, so long as it meets all of those requirements. A sensible implementation might use lists of chunks, or it might use some other technique with different trade-offs.
It's probably impossible to say that one technique is strictly better than another for all users in all situations. Which is why the standard gives implementors some freedom to choose.