Preventing getting the address of items in a container in C++ - c++

I'm writing a C++ array-like container that can dynamically grow and shrink. I'd like to prevent users of this container from taking the address of its items, because they might be reallocated when the container needs to reallocate itself. The only correct way of using this container will be by keeping track of the address of the container and the index of each item (yes, I'll be specifying it in the documentation, but it would be better if I could make the compiler trigger an error if a user of the container tries to get the address of an item?
Can this be done somehow? I searched and found some question regarding making the "address of operator" private, but it doesn't seem to be guaranteed to work, nor it's a recommended practice either. So, I wonder if there could be any alternative technique for preventing access to pointers to items...

In C++ all memory allocated to the program is more or less fair game. So there is no real way to prevent users of your array type to obtain or calculate addresses within your array.
The STL even makes a lot of effort to guarantee that iterators do not suddenly become invalid because of this.
In the real world you write in the API description that it is very unwise to work with addresses within your container because the could become invalid at any time and then it is the responsibility of the users to follow that rule, IMHO.

What you want to do is not possible easily (if at all).
If your container invalidates pointers to elements in certain situations, that does not make your container "special". Lets consider what other containers do to mitigate this problem.
std::vector:
// THIS CODE IS BROKEN !! DO NOT WRITE CODE LIKE THIS !!
std::vector<int> x(24,0);
int* ptr = &x[0];
auto it = x.begin();
x.push_back(42);
x.insert(it, 100); // *
*ptr = 5;
*it = 7;
Starting from the line marked with * everything here uses an invalid pointer or iterator. From cppreference, push_back:
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
Actually x.insert(it, 100); might be using a valid iterator, but the code does not check whether the push_back had to increase capacity, so one has to assume that it and ptr are invalid after the call to push_back.
insert:
Causes reallocation if the new size() is greater than the old capacity(). If the new size() is greater than capacity(), all iterators and references are invalidated. Otherwise, only the iterators and references before the insertion point remain valid. The past-the-end iterator is also invalidated.
Users of standard containers must be aware of iterator invalidation rules (see Iterator invalidation rules) or they will write horribly broken code like the one above.
In general, you cannot protect yourself from all mistakes a user can possibly make. Document pre- and post-conditions and if a user ignores them they just get what they deserve.
Note that you could try to overload the & operator, but there is no way you can prevent someone to get the adress via std::addressof, which is made exactly for that: Get the address of an object in case the object tries to prevent it by overloading &.

Related

STL iterator revalidation for end (past-the-end) iterator?

See related questions on past-the-end iterator invalidation:
this, this.
This is more a question of design, namely, is there (in STL or elsewhere) such concept as past-the-end iterator "revalidation"?
What I mean by this, and use case: suppose an algorithm needs to "tail" a container (such as a queue). It traverses the container until end() is reached, then pauses; independently from this, another part of the program enqueues more items in the queue. How is it possible for the algorithm to (EDIT) efficiently tell, "have more items been enqueued" while holding the previously past-the-end iterator (call it tailIt)? (this would imply it is able to check if tailIt == container.end() still, and if that is false, conclude tailIt is now valid and points to the first element that was inserted).
Please don't dismiss the question as "no, there isn't" - I'm looking to form judgment around how to design some logic in an idiomatic way, and have many options (in fact the iterators in question are to a hand-built data structure for which I can provide this property - end() revalidation - but I would like to judge if it is a good idea).
EDIT: made it clear we have the iterator tailIt and a reference to container. A trivial workaround for what I'm trying to do is, also remember count := how many items you processed, and then check is container.size() == count still, and if not, seek to container[count] and continue processing from there. This comes with many disadvantages (extra state, assumption container doesn't pop from the front (!), random-access for efficient seeking).
Not in general. Here are some issues with your idea:
Some past-the-end iterators don't "point" to the data block at all; in fact this will be true of any iterator except a vector iterator. So, overall, an extant end-iterator just is never going to become a valid iterator to data;
Iterators often become invalidated when the container changes — while this isn't always true, it also precludes a general solution that relies on dereferencing some iterator from before the mutation;
Iterator validity is non-observable — you already need to know, before you dereference an iterator, whether or not it is valid. This is information that comes from elsewhere, usually your brain… by that I mean the developer must read the code and make a determination based on its structure and flow.
Put all these together and it is clear that the end iterator simply cannot be used this way as the iterator interface is currently designed. Iterators refer to data in a range, not to a container; it stands to reason, then, that they hold no information about a container, and if the container causes the range to change there's no entity that the iterator knows about that it can ask to find this out.
Is the described logic possible to create? Certainly! But with a different iterator interface (and support from the container). You could wrap the container in your own class type to do this. However, I advise against making things that look like standard iterators but behave differently; this will be very confusing.
Instead, encapsulate the container and provide your own wrapper function that can directly perform whatever post-enqueuement action you feel you need. You shouldn't need to watch the state of the end iterator to achieve your goal.
In the case for a std::queue, no there isn't (heh). Not because the iterators for a queue get invalidated once something is pushed, but because a queue doesn't have any iterators at all.
As for other iterator types, most (or any of them) of them don't require a reference to the container holder (the managing object containing all the info about the underlying data). Which is an trade-off for efficiency over flexibility. (I quickly checked the implementation of gcc's std::vector::iterator)It is possible to write an implementation for an iterator type that keeps a reference to the holder during its lifetime, that way the iterators never have to be invalidated! (unless the holder is std::move'd)
Now to throw in my professional opinion, I wouldn't mind seeing a safe_iterator/flex_iterator for cases where the iterator normally would be invalidated during iterations.
Possible user interface:
for (auto v : make_flex_iterator(my_vector)) {
if (some_outside_condition()) {
// Normally the vector would be invalidated at this point
// (only if resized, but you should always assume a resize)
my_vector.push_back("hello world!");
}
}
Literally revalidating iterators might be too complex to build for it's use case (I wouldn't know where to begin), but designing an iterator which simply never invalidates is quite trivial, with only as much overhead as a for (size_t i = 0; i < c.size(); i++); loop.But with that said, I cannot assure you how well the compiler will optimize, like unrolling loops, with these iterators. I do assume it will still do quite a good job.

C++ and iterator invalidation

So I'm going through Accelerated C++ and am somewhat unsure about iterator invalidation in C++. Maybe it's the fact that it is never explained how these iterators are constructed is the problem.
Here is one example:
Vector with {1,2,3}
If my iterator is on {2} and I call an erase on {2} my iterator is invalid. Why? In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element. The only way I would see this as being not true is if iterators were made before hand for each element and each iterator had some type of field containing the address of the following element in that container.
My other question has to do with the statement such as "invalidates all other iterators". Erm, when I loop through my vector container, I am using one iterator. Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element.
That may be the case. But it’s equally valid that the whole vector is relocated in memory, thus making all iterators point to now-defunct memory locations. C++ simply makes no guarantees either way. (See comments for discussion.)
Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
You’re merely missing the fact that you may have other iterators referencing the same vector besides your loop variable. For example, the following loop is an idiomatic style that caches the end iterator of the vector to avoid redundant calls:
vector<int> vec;
// …
for (vector<int>::iterator i(vec.begin()), end(vec.end()); i != end; ++i) {
if (some_condition)
vec.erase(i); // invalidates `i` and `end`.
}
(Nevermind the fact that this copy of the end iterator is in fact unnecessary with the STL on modern compilers.)
The following C++ defect report (fixed in C++0x) contains a brief discussion of the meaning of "invalidate":
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#414
int A[8] = { 1,3,5,7,9,8,4,2 };
std::vector<int> v(A, A+8);
std::vector<int>::iterator i1 = v.begin() + 3;
std::vector<int>::iterator i2 = v.begin() + 4;
v.erase(i1);
Which iterators are invalidated by
v.erase(i1): i1, i2, both, or neither?
On all existing implementations that I
know of, the status of i1 and i2 is
the same: both of them will be
iterators that point to some elements
of the vector (albeit not the same
elements they did before). You won't
get a crash if you use them. Depending
on exactly what you mean by
"invalidate", you might say that
neither one has been invalidated
because they still point to something,
or you might say that both have been
invalidated because in both cases the
elements they point to have been
changed out from under the iterator.
It seems that the specification is "playing safe" regarding iterator and reference invalidation. It says that they're invalidated even though, as you and Matt Austern both noted, there's still a vector element at the same address. It just has a different value.
So, those of us following the standard must program as if that iterator can't be used any more, even though no implementation is likely to do anything that would actually stop them working, except perhaps a debugging iterator that could do extra work to let us know we're off-road.
In fact that defect report relates to exactly the case you're talking about. As far as the C++03 standard actually says, at least in that clause, your iterator isn't invalidated. But that was considered an error.
An iterator basically wraps a pointer. Some operations on containers have the effect of reallocating some or all of the data behind the scenes. In that case, all current pointers/iterators are left pointing to the wrong memory locations.
The image "in your mind" is an implementation detail, and it could be that your iterator isn't implemented that way. Likely it is, but it could be that it isn't.
The "ivalidates all other iterators" language is their way of saying that the implemenation is allowed the freedom to do anything its coders' skeevie hearts feel like to the contaier when you perform that operation, including things that require internal changes to iterators. Since the only iterator it has access to is the one you passed in, that's the only one that it can fix up if need be.
If you want the behavior in your head for a vector, it is easy to get. Just use an index into the vector instead of an iterator. Then it works just like you think.
Chances are that your iterator is actually pointing at the 3 -- but it's not certain.
The general idea is to allow vector to allocate new storage and move your data from one block of storage to another when/if it sees fit to do so. As such, when you insert or delete data, the data might move to some other part of memory entirely.
At least that was sort of the intent. It turns out that other rules probably prevent it from moving the data when you delete -- but the iterator is invalidated anyway, probably because somebody didn't quite understand all the implications of those other rules when this one was made.
From SGI http://www.sgi.com/tech/stl/Vector.html
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
So you can erase starting from end
int i;
vector v;
for ( i = v.size(), i >=0, i--)
{
if (v[i])
v.erase(v.begin() + i);
}
OR use iterator returned from vector erase()
std::vector<int> v;
for (std::vector<int>::iterator it = v.begin(); it != v.end(); )
it = v.erase(it);

What does enabling STL iterator debugging really do?

I've enabled iterator debugging in an application by defining
_HAS_ITERATOR_DEBUGGING = 1
I was expecting this to really just check vector bounds, but I have a feeling it's doing a lot more than that. What checks, etc are actually being performed?
Dinkumware STL, by the way.
There is a number of operations with iterators which lead to undefined behavior, the goal of this trigger is to activate runtime checks to prevent it from occurring (using asserts).
The issue
The obvious operation is to use an invalid iterator, but this invalidity may arise from various reasons:
Uninitialized iterator
Iterator to an element that has been erased
Iterator to an element which physical location has changed (reallocation for a vector)
Iterator outside of [begin, end)
The standard specifies in excruciating details for each container which operation invalidates which iterator.
There is also a somehow less obvious reason that people tend to forget: mixing iterators to different containers:
std::vector<Animal> cats, dogs;
for_each(cats.begin(), dogs.end(), /**/); // obvious bug
This pertain to a more general issue: the validity of ranges passed to the algorithms.
[cats.begin(), dogs.end()) is invalid (unless one is an alias for the other)
[cats.end(), cats.begin()) is invalid (unless cats is empty ??)
The solution
The solution consists in adding information to the iterators so that their validity and the validity of the ranges they defined can be asserted during execution thus preventing undefined behavior to occur.
The _HAS_ITERATOR_DEBUGGING symbol serves as a trigger to this capability, because it unfortunately slows down the program. It's quite simple in theory: each iterator is made an Observer of the container it's issued from and is thus notified of the modification.
In Dinkumware this is achieved by two additions:
Each iterator carries a pointer to its related container
Each container holds a linked list of the iterators it created
And this neatly solves our problems:
An uninitialized iterator does not have a parent container, most operations (apart from assignment and destruction) will trigger an assertion
An iterator to an erased or moved element has been notified (thanks to the list) and know of its invalidity
On incrementing and decrementing an iterator it can checks it stays within the bounds
Checking that 2 iterators belong to the same container is as simple as comparing their parent pointers
Checking the validity of a range is as simple as checking that we reach the end of the range before we reach the end of the container (linear operation for those containers which are not randomly accessible, thus most of them)
The cost
The cost is heavy, but does correctness have a price? We can break down the cost:
extra memory allocation (the extra list of iterators maintained): O(NbIterators)
notification process on mutating operations: O(NbIterators) (Note that push_back or insert do not necessarily invalidate all iterators, but erase does)
range validity check: O( min(last-first, container.end()-first) )
Most of the library algorithms have of course been implemented for maximum efficiency, typically the check is done once and for all at the beginning of the algorithm, then an unchecked version is run. Yet the speed might severely slow down, especially with hand-written loops:
for (iterator_t it = vec.begin();
it != vec.end(); // Oops
++it)
// body
We know the Oops line is bad taste, but here it's even worse: at each run of the loop, we create a new iterator then destroy it which means allocating and deallocating a node for vec's list of iterators... Do I have to underline the cost of allocating/deallocating memory in a tight loop ?
Of course, a for_each would not encounter such an issue, which is yet another compelling case toward the use of STL algorithms instead of hand-coded versions.
As far as I understand:
_HAS_ITERATOR_DEBUGGING will display a dialog box at run time to assert any incorrect iterator use including:
1) Iterators used in a container after an element is erased
2) Iterators used in vectors after a .push() or .insert() function is called
According to http://msdn.microsoft.com/en-us/library/aa985982%28v=VS.80%29.aspx
The C++ standard describes which member functions cause iterators to a container to become invalid. Two examples are:
Erasing an element from a container causes iterators to the element to become invalid.
Increasing the size of a vector (push or insert) causes iterators into the vector container become invalid.

Storing iterators inside containers

I am building a DLL that another application would use. I want to store the current state of some data globally in the DLL's memory before returning from the function call so that I could reuse state on the next call to the function.
For doing this, I'm having to save some iterators. I'm using a std::stack to store all other data, but I wasn't sure if I could do that with the iterators also.
Is it safe to put list iterators inside container classes? If not, could you suggest a way to store a pointer to an element in a list so that I can use it later?
I know using a vector to store my data instead of a list would have allowed me to store the subscript and reuse it very easily, but unfortunately I'm having to use only an std::list.
Iterators to list are invalidated only if the list is destroyed or the "pointed" element is removed from the list.
Yes, it'll work fine.
Since so many other answers go on about this being a special quality of list iterators, I have to point out that it'd work with any iterators, including vector ones. The fact that vector iterators get invalidated if the vector is modified is hardly relevant to a question of whether it is legal to store iterators in another container -- it is. Of course the iterator can get invalidated if you do anything that invalidates it, but that has nothing to do with whether or not the iterator is stored in a stack (or any other data structure).
It should be no problem to store the iterators, just make sure you don't use them on a copy of the list -- an iterator is bound to one instance of the list, and cannot be used on a copy.
That is, if you do:
std::list<int>::iterator it = myList.begin ();
std::list<int> c = myList;
c.insert (it, ...); // Error
As noted by others: Of course, you should also not invalidate the iterator by removing the pointed-to element.
This might be offtopic, but just a hint...
Be aware, that your function(s)/data structure would probably be thread unsafe for read operations. There is a kind of basic thread safety where read operations do not require synchronization. If you are going to store the sate how much the caller read from your structure it will make the whole concept thread unsafe and a bit unnatural to use. Because nobody assumes a read to be state-full operation.
If two threads are going to call it they will either need to synchronize the calls or your data structure might end-up in a race condition. The problem in such a design is that both threads must have access to a common synchronization variable.
I would suggest making two overloaded functions. Both are stateless, but one of them should accept a hint iterator, where to start next read/search/retrieval etc. This is e.g. how Allocator in STL is implemented. You can pass to allocator a hint pointer (default 0) so that it quicker finds a new memory chunk.
Regards,
Ovanes
Storing the iterator for the list should be fine. It will not get invalidated unless you remove the same element from the list for which you have stored the iterator. Following quote from SGI site:
Lists have the important property that
insertion and splicing do not
invalidate iterators to list elements,
and that even removal invalidates only
the iterators that point to the
elements that are removed
However, note that the previous and next element of the stored iterator may change. But the iterator itself will remain valid.
The same rule applies to an iterator stored in a local variable as in a longer lived data structure: it will stay valid as long as the container allows.
For a list, this means: as long as the node it points to is not deleted, the iterator stays valid. Obviously the node gets deleted when the list is destructed...

Why does push_back or push_front invalidate a deque's iterators?

As the title asks.
My understanding of a deque was that it allocated "blocks". I don't see how allocating more space invalidates iterators, and if anything, one would think that a deque's iterators would have more guarantees than a vector's, not less.
The C++ standard doesn't specify how deque is implemented. It isn't required to allocate new space by allocating a new chunk and chaining it on to the previous ones, all that's required is that insertion at each end be amortized constant time.
So, while it's easy to see how to implement deque such that it gives the guarantee you want[*], that's not the only way to do it.
[*] Iterators have a reference to an element, plus a reference to the block it's in so that they can continue forward/back off the ends of the block when they reach them. Plus I suppose a reference to the deque itself, so that operator+ can be constant-time as expected for random-access iterators -- following a chain of links from block to block isn't good enough.
What's more interesting is that push_back and push_front will not invalidate any references to a deque's elements. Only iterators are to be assumed invalid.
The standard, to my knowledge, doesn't state why. However if an iterator were implemented that was aware of its immediate neighbors - as a list is - that iterator would become invalid if it pointed to an element that was both at the edge of the deque and the edge of a block.
My guess. push_back/push_front can allocate a new memory block. A deque iterator must know when increment/decrement operator should jump into the next block. The implementation may store that information in iterator itself. Incrementing/decrementing an old iterator after push_back/push_front may not work as intended.
This code may or may not fail with run time error. On my Visual Studio it failed in debug mode but run to the conclusion in release mode. On Linux it caused segmentation fault.
#include <iostream>
#include <deque>
int main() {
std::deque<int> x(1), y(1);
std::deque<int>::iterator iterx = x.begin();
std::deque<int>::iterator itery = y.begin();
for (int i=1; i<1000000; ++i) {
x.push_back(i);
y.push_back(i);
++iterx;
++itery;
if(*iterx != *itery) {
std::cout << "increment failed at " << i << '\n';
break;
}
}
}
The key thing is not to make any assumptions just treat the iterator as if it will be invalidated.
Even if it works fine now, a later version of the compiler or the compiler for a different platform might come along and break your code. Alternatively, a colleague might come along and decide to turn your deque into a vector or linked list.
An iterator is not just a reference to the data. It must know how to increment, etc.
In order to support random access, implementations will have a dynamic array of pointers to the chunks. The deque iterator will point into this dynamic array. When the deque grows, a new chunk might need to be allocated. The dynamic array will grow, invalidating its iterators and, consequently, the deque's iterators.
So it is not that chunks are reallocated, but the array of pointers to these chunks can be. Indeed, as Johannes Schaub noted, references are not invalidated.
Also note that the deque's iterator guarantees are not less than the vector's, which are also invalidated when the container grows.
Even when you are allocating in chunks, an insert will cause that particular chunk to be reallocated if there isn't enough space (as is the case with vectors).
Because the standard says it can. It does not mandate that deque be implemented as a list of chunks. It mandates a particular interface with particular pre and post conditions and particular algorithmic complexity minimums.
Implementors are free to implement the thing in whatever way they choose, so long as it meets all of those requirements. A sensible implementation might use lists of chunks, or it might use some other technique with different trade-offs.
It's probably impossible to say that one technique is strictly better than another for all users in all situations. Which is why the standard gives implementors some freedom to choose.