How to define a C++ iterator that skips tombstones

How to define a C++ iterator that skips tombstones - c++

I am implementing a container that presents a map-like interface. The physicals implementation is an std::vector<std::pair<K*, T>>. A K object remembers its assigned position in the vector. It is possible for a K object to get destroyed. In that case its remembered index is used to zero out its corresponding key pointer within the vector, creating a tombstone.
I would like to expose the full traditional collection of iterators, though I think that they need only claim to be forward_iterators (see next).
I want to be able to use range-based for loop iteration to return the only non-tombstoned elements. Further, I would like the implementation of my iterators to be a single pointer (i.e. no back pointer to the container).
Since the range-based for loop is pretested I think that I can implement tombstone skipping within the inequality predicate.
bool operator != (MyInterator& cursor, MyIterator stop) {
while (cursor != stop) {
if (cursor->first)
return true;
++cursor;
}
return false;
}
Is this a reasonable approach? If yes, is there a simple way for me to override the inequality operator of std::vector's iterators instead of implementing my iterators from scratch?
If this is not a reasonable approach, what would be better?

Is this a reasonable approach?
No. (Keep in mind that operator!= can be used outside a range-based for loop.)
Your operator does not accept a const object as its first parameter (meaning a const vector::iterator).
You have undefined behavior if the first parameter comes after the second (e.g. if someone tests end != cur instead of cur != end).
You get this weird case where, given iterators a and b, it might be that *a is different than *b, but if you check if (a != b) then you find that the iterators are equal and then *a is the same as *b. This probably wrecks havoc with the multipass guarantee of forward iterators (but the situation is bizarre enough that I would want to check the standard's precise wording before passing judgement). Messing with people's expectations is inadvisable.
There is no simple way to override the inequality operator of std::vector's iterators.
If this is not a reasonable approach, what would be better?
You already know what would be better. You're just shying away from it.
Implement your own iterators from scratch. Wrapping your vector in your own class has the benefit that only the code for that class has to be aware that tombstones exist.
Caveat: Document that the conditions that create a tombstone also invalidate iterators to that element. (Invalid iterators are excluded from most iterator requirements, such as the multipass guarantee.)
OR
While your implementation makes a poor operator!=, it could be a fine update or check function. There's this little-known secret that C++ has more looping structures than just range-based for loops. You could make use of one of these, for example:
for ( cur = vec.begin(); skip_tombstones(cur, vec.end()); ++cur ) {
auto& element = *cur;
where skip_tombstones() is basically your operator!= renamed. If not much code needs to iterate over the vector, this might be a reasonable option, even in the long term.

Related

What is the advantage of using (it != vector.end()) instead of (it < vector.end()) in for loops? [duplicate]

I'm used to writing loops like this:
for (std::size_t index = 0; index < foo.size(); index++)
{
// Do stuff with foo[index].
}
But when I see iterator loops in others' code, they look like this:
for (Foo::Iterator iterator = foo.begin(); iterator != foo.end(); iterator++)
{
// Do stuff with *Iterator.
}
I find the iterator != foo.end() to be offputting. It can also be dangerous if iterator is incremented by more than one.
It seems more "correct" to use iterator < foo.end(), but I never see that in real code. Why not?

All iterators are equality comparable. Only random access iterators are relationally comparable. Input iterators, forward iterators, and bidirectional iterators are not relationally comparable.
Thus, the comparison using != is more generic and flexible than the comparison using <.
There are different categories of iterators because not all ranges of elements have the same access properties. For example,
if you have an iterators into an array (a contiguous sequence of elements), it's trivial to relationally compare them; you just have to compare the indices of the pointed to elements (or the pointers to them, since the iterators likely just contain pointers to the elements);
if you have iterators into a linked list and you want to test whether one iterator is "less than" another iterator, you have to walk the nodes of the linked list from the one iterator until either you reach the other iterator or you reach the end of the list.
The rule is that all operations on an iterator should have constant time complexity (or, at a minimum, sublinear time complexity). You can always perform an equality comparison in constant time since you just have to compare whether the iterators point to the same object. So, all iterators are equality comparable.
Further, you aren't allowed to increment an iterator past the end of the range into which it points. So, if you end up in a scenario where it != foo.end() does not do the same thing as it < foo.end(), you already have undefined behavior because you've iterated past the end of the range.
The same is true for pointers into an array: you aren't allowed to increment a pointer beyond one-past-the-end of the array; a program that does so exhibits undefined behavior. (The same is obviously not true for indices, since indices are just integers.)
Some Standard Library implementations (like the Visual C++ Standard Library implementation) have helpful debug code that will raise an assertion when you do something illegal with an iterator like this.

Short answer: Because Iterator is not a number, it's an object.
Longer answer: There are more collections than linear arrays. Trees and hashes, for example, don't really lend themselves to "this index is before this other index". For a tree, two indices that live on separate branches, for example. Or, any two indices in a hash -- they have no order at all, so any order you impose on them is arbitrary.
You don't have to worry about "missing" End(). It is also not a number, it is an object that represents the end of the collection. It doesn't make sense to have an iterator that goes past it, and indeed it cannot.

C++ STL - Why treat a function that returns an iterator as a void function?

The STL has many functions that return iterators. For example, the STL list function erase returns an iterator (both C++98 and C++11). Nevertheless, I see it being used as a void function. Even the cplusplus.com site has an example that contains the following code:
mylist.erase(it1, it2);
which does not return an iterator. Shouldn't the proper syntax be as follows?
std::list<int>::iterator iterList = mylist.erase(it1, it2)?

You are not forced to use a return value in C++. Normally the returned iterator should be useful to have a new valid iterator to that container. But syntactically it's correct. Just keep in mind that after the erasure, it1 and it2 will not be valid any more.

which does not return an iterator. Shouldn't the proper syntax be as
follows?
It does return an iterator, but just as with any other function you can ignore the returned value. If you just want to erase an element and dont care about the returned iterator, then you may simply ignore it.
Actually ignoring the return value is more common than you might expect. Maybe the most common place where return values are ignored is when the only purpose of the return value is to enable chaining. For example the assignment operator is usually written as
Foo& operator=(const Foo& other){
/*...*/
return *this;
}
so that you can write something like a = b = c. However, when you only write b = c you usually ignore the value returned by that call. Also note, that when you write b = c this does return a value, but if you just want to make this assignment, then you have no other choice than to ignore the return value.

The actual reason is far more banal than you might think. If you couldn't ignore a return value, many of the functions like .erase() would need to be split in two versions: an .erase() version returning void and another .erase_and_return() version which returned the iterator. What would you gain from this?

Return values are sometimes that are sometimes useful. If they are not useful, you don't have to use them.
All container.erase functions return the iterator after the newly erased range. For non-node-based containers this is often (but not always) useful because the iterators at and after the range are no longer valid.
For node-based containers this is usually useless, as the 2nd iterator passed in remains valid even after the erase operation.
Regardless, both return that iterator. This permits code that works on a generic container to not have to know if the container maintains valid iterators after the erase or not; it can just store the return value and have a valid iterator to after-the-erase.
Iterators in C++ standard containers are all extremely cheap to create and destroy and copy; in fact, they are in practice so cheap that compilers can eliminate them entirely if they aren't used. So returning an iterator that isn't used can have zero run time cost.
A program that doesn't use this return value here can be a correct program, both semantically and syntactically. At the same time, other semantically and syntactically correct programs will require that you use that return value.
Finally,
mylist.erase(it1, it2);
this does return an iterator. The iterator is immediately discarded (the return value only exists as an unnamed temporary), and compilers are likely to optimize it out of existence.
But just because you don't store a return value, doesn't mean it isn't returned.

Why is "!=" used with iterators instead of "<"?

I'm used to writing loops like this:
for (std::size_t index = 0; index < foo.size(); index++)
{
// Do stuff with foo[index].
}
But when I see iterator loops in others' code, they look like this:
for (Foo::Iterator iterator = foo.begin(); iterator != foo.end(); iterator++)
{
// Do stuff with *Iterator.
}
I find the iterator != foo.end() to be offputting. It can also be dangerous if iterator is incremented by more than one.
It seems more "correct" to use iterator < foo.end(), but I never see that in real code. Why not?

All iterators are equality comparable. Only random access iterators are relationally comparable. Input iterators, forward iterators, and bidirectional iterators are not relationally comparable.
Thus, the comparison using != is more generic and flexible than the comparison using <.
There are different categories of iterators because not all ranges of elements have the same access properties. For example,
if you have an iterators into an array (a contiguous sequence of elements), it's trivial to relationally compare them; you just have to compare the indices of the pointed to elements (or the pointers to them, since the iterators likely just contain pointers to the elements);
if you have iterators into a linked list and you want to test whether one iterator is "less than" another iterator, you have to walk the nodes of the linked list from the one iterator until either you reach the other iterator or you reach the end of the list.
The rule is that all operations on an iterator should have constant time complexity (or, at a minimum, sublinear time complexity). You can always perform an equality comparison in constant time since you just have to compare whether the iterators point to the same object. So, all iterators are equality comparable.
Further, you aren't allowed to increment an iterator past the end of the range into which it points. So, if you end up in a scenario where it != foo.end() does not do the same thing as it < foo.end(), you already have undefined behavior because you've iterated past the end of the range.
The same is true for pointers into an array: you aren't allowed to increment a pointer beyond one-past-the-end of the array; a program that does so exhibits undefined behavior. (The same is obviously not true for indices, since indices are just integers.)
Some Standard Library implementations (like the Visual C++ Standard Library implementation) have helpful debug code that will raise an assertion when you do something illegal with an iterator like this.

Short answer: Because Iterator is not a number, it's an object.
Longer answer: There are more collections than linear arrays. Trees and hashes, for example, don't really lend themselves to "this index is before this other index". For a tree, two indices that live on separate branches, for example. Or, any two indices in a hash -- they have no order at all, so any order you impose on them is arbitrary.
You don't have to worry about "missing" End(). It is also not a number, it is an object that represents the end of the collection. It doesn't make sense to have an iterator that goes past it, and indeed it cannot.

C++ vector insights

I am a little bit frustrated of how to use vectors in C++. I use them widely though I am not exactly certain of how I use them. Below are the questions?
If I have a vector lets say: std::vector<CString> v_strMyVector, with (int)v_strMyVector.size > i can I access the i member: v_strMyVector[i] == "xxxx"; ? (it works, though why?)
Do i always need to define an iterator to acces to go to the beginning of the vector, and lop on its members ?
What is the purpose of an iterator if I have access to all members of the vector directly (see 1)?
Thanks in advance,
Sun

It works only because there's no bounds checking for operator[], for performance reason. Doing so will result in undefined behavior. If you use the safer v_strMyVector.at(i), it will throw an OutOfRange exception.
It's because the operator[] returns a reference.
Since vectors can be accessed randomly in O(1) time, looping by index or iterator makes no performance difference.
The iterator lets you write an algorithm independent of the container. This iterator pattern is used a lot in the <algorithm> library to allow writing generic code easier, e.g. instead of needing N members for each of the M containers (i.e. writing M*N functions)
std::vector<T>::find(x)
std::list<T>::find(x)
std::deque<T>::find(x)
...
std::vector<T>::count(x)
std::list<T>::count(x)
std::deque<T>::count(x)
...
we just need N templates
find(iter_begin, iter_end, x);
count(iter_begin, iter_end, x);
...
and each of the M container provide the iterators, reducing the number of function needed to just M+N.

It returns a reference.
No,, because vector has random access. However, you do for other types (e.g. list, which is a doubly-linked list)
To unify all the collections (along with other types, like arrays). That way you can use algorithms like std::copy on any type that meets the requirements.

Regarding your second point, the idiomatic C++ way is not to loop at all, but to use algorithms (if feasible).
Manual looping for output:
for (std::vector<std::string>::iterator it = vec.begin(); it != end(); ++it)
{
std::cout << *it << "\n";
}
Algorithm:
std::copy(vec.begin(), vec.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
Manual looping for calling a member function:
for (std::vector<Drawable*>::iterator it = vec.begin(); it != end(); ++it)
{
(*it)->draw();
}
Algorithm:
std::for_each(vec.begin(), vec.end(), std::mem_fun(&Drawable::draw));
Hope that helps.

Workd because the [] operator is overloaded:
reference operator[](size_type n)
See http://www.sgi.com/tech/stl/Vector.html
Traversing any collection in STL using iterator is a de facto.
I think one advantage is if you replace vector by another collection, all of your code would continue to work.

That's the idea of vectors, they provide direct access to all items, much as regular arrays. Internally, vectors are represented as dynamically allocated, contiguous memory areas. The operator [] is defined to mimic semantics of the regular array.
Having an iterator is not really required, you may as well use an index variable that goes from 0 to v_strMtVector.size()-1, as you would do with regular array:
for (int i = 0; i < v_strMtVector.size(); ++i) {
...
}
That said, using an iterator is considered to be a good style by many, because...
Using an iterator makes it easier to replace underlying container type, e.g. from std::vector<> to std::list<>. Iterators may also be used with STL algorithms, such as std::sort().

std::vector is a type of sequence that provides constant time random access. You can access a reference to any item by reference in constant time but you pay for it when inserting into and deleting from the vector as these can be very expensive operations. You do not need to use iterators when accessing the contents of the vector, but it does support them.

Checking if an iterator is valid

Is there any way to check if an iterator (whether it is from a vector, a list, a deque...) is (still) dereferenceable, i.e. has not been invalidated?
I have been using try-catch, but is there a more direct way to do this?
Example: (which doesn't work)
list<int> l;
for (i = 1; i<10; i++) {
l.push_back(i * 10);
}
itd = l.begin();
itd++;
if (something) {
l.erase(itd);
}
/* now, in other place.. check if it points to somewhere meaningful */
if (itd != l.end())
{
// blablabla
}

I assume you mean "is an iterator valid," that it hasn't been invalidated due to changes to the container (e.g., inserting/erasing to/from a vector). In that case, no, you cannot determine if an iterator is (safely) dereferencable.

As jdehaan said, if the iterator wasn't invalidated and points into a container, you can check by comparing it to container.end().
Note, however, that if the iterator is singular -- because it wasn't initialized or it became invalid after a mutating operation on the container (vector's iterators are invalidated when you increase the vector's capacity, for example) -- the only operation that you are allowed to perform on it is assignment. In other words, you can't check whether an iterator is singular or not.
std::vector<int>::iterator iter = vec.begin();
vec.resize(vec.capacity() + 1);
// iter is now singular, you may only perform assignment on it,
// there is no way in general to determine whether it is singular or not

Non-portable answer: Yes - in Visual Studio
Visual Studio's STL iterators have a "debugging" mode which do exactly this. You wouldn't want to enable this in ship builds (there is overhead) but useful in checked builds.
Read about it on VC10 here (this system can and in fact does change every release, so find the docs specific to your version).
Edit Also, I should add: debug iterators in visual studio are designed to immediately explode when you use them (instead undefined behavior); not to allow "querying" of their state.

Usually you test it by checking if it is different from the end(), like
if (it != container.end())
{
// then dereference
}
Moreover using exception handling for replacing logic is bad in terms of design and performance. Your question is very good and it is definitively worth a replacement in your code. Exception handling like the names says shall only be used for rare unexpected issues.

Is there any way to check if a iterator (whether it is from a vector, a list, a deque...) is (still) dereferencable, i.e has not been invalidated ?
No, there isn't. Instead you need to control access to the container while your iterator exists, for example:
Your thread should not modify the container (invalidating the iterator) while it is still using an instantiated iterator for that container
If there's a risk that other threads might modify the container while your thread is iterating, then in order to make this scenario thread-safe your thread must acquire some kind of lock on the container (so that it prevents other threads from modifying the container while it's using an iterator)
Work-arounds like catching an exception won't work.
This is a specific instance of the more general problem, "can I test/detect whether a pointer is valid?", the answer to which is typically "no, you can't test for it: instead you have to manage all memory allocations and deletions in order to know whether any given pointer is still valid".

Trying and catching is not safe, you will not, or at least seldom throw if your iterator is "out of bounds".
what alemjerus say, an iterator can always be dereferenced. No matter what uglyness lies beneath. It is quite possible to iterate into other areas of memory and write to other areas that might keep other objects. I have been looking at code, watching variables change for no particular reason. That is a bug that is really hard to detect.
Also it is wise to remember that inserting and removing elements might potentially invalidate all references, pointers and iterators.
My best advice would be to keep you iterators under control, and always keep an "end" iterator at hand to be able to test if you are at the "end of the line" so to speak.

In some of the STL containers, the current iterator becomes invalid when you erase the current value of the iterator. This happens because the erase operation changes the internal memory structure of the container and increment operator on existing iterator points to an undefined locations.
When you do the following, iterator is incementented before it is passed to erase function.
if (something) l.erase(itd++);

Is there any way to check if an iterator is dereferencable
Yes, with gcc debugging containers available as GNU extensions. For std::list you can use __gnu_debug::list instead. The following code will abort as soon as invalid iterator is attempted to be used. As debugging containers impose extra overhead they are intended only when debugging.
#include <debug/list>
int main() {
__gnu_debug::list<int> l;
for (int i = 1; i < 10; i++) {
l.push_back(i * 10);
}
auto itd = l.begin();
itd++;
l.erase(itd);
/* now, in other place.. check if itd points to somewhere meaningful */
if (itd != l.end()) {
// blablabla
}
}
$ ./a.out
/usr/include/c++/7/debug/safe_iterator.h:552:
Error: attempt to compare a singular iterator to a past-the-end iterator.
Objects involved in the operation:
iterator "lhs" # 0x0x7ffda4c57fc0 {
type = __gnu_debug::_Safe_iterator<std::_List_iterator<int>, std::__debug::list<int, std::allocator<int> > > (mutable iterator);
state = singular;
references sequence with type 'std::__debug::list<int, std::allocator<int> >' # 0x0x7ffda4c57ff0
}
iterator "rhs" # 0x0x7ffda4c580c0 {
type = __gnu_debug::_Safe_iterator<std::_List_iterator<int>, std::__debug::list<int, std::allocator<int> > > (mutable iterator);
state = past-the-end;
references sequence with type 'std::__debug::list<int, std::allocator<int> >' # 0x0x7ffda4c57ff0
}
Aborted (core dumped)

The type of the parameters of the erase function of any std container (as you have listed in your question, i.e. whether it is from a vector, a list, a deque...) is always iterator of this container only.
This function uses the first given iterator to exclude from the container the element that this iterator points at and even those that follow. Some containers erase only one element for one iterator, and some other containers erase all elements followed by one iterator (including the element pointed by this iterator) to the end of the container. If the erase function receives two iterators, then the two elements, pointed by each iterator, are erased from the container and all the rest between them are erased from the container as well, but the point is that every iterator that is passed to the erase function of any std container becomes invalid! Also:
Each iterator that was pointing at some element that has been erased from the container becomes invalid, but it doesn't pass the end of the container!
This means that an iterator that was pointing at some element that has been erased from the container cannot be compared to container.end().
This iterator is invalid, and so it is not dereferencable, i.e. you cannot use neither the * nor -> operators, it is also not incrementable, i.e. you cannot use the ++ operator, and it is also not decrementable, i.e. you cannot use the -- operator.
It is also not comparable!!! I.E. you cannot even use neither == nor != operators
Actually you cannot use any operator that is declared and defined in the std iterator.
You cannot do anything with this iterator, like null pointer.
Doing something with an invalid iterator immediately stops the program and even causes the program to crash and an assertion dialog window appears. There is no way to continue program no matter what options you choose, what buttons you click. You just can terminate the program and the process by clicking the Abort button.
You don't do anything else with an invalid iterator, unless you can either set it to the begin of the container, or just ignore it.
But before you decide what to do with an iterator, first you must know if this iterator is either invalid or not, if you call the erase function of the container you are using.
I have made by myself a function that checks, tests, knows and returns true whether a given iterator is either invalid or not. You can use the memcpy function to get the state of any object, item, structure, class and etc, and of course we always use the memset function at first to either clear or empty a new buffer, structure, class or any object or item:
bool IsNull(list<int>::iterator& i) //In your example, you have used list<int>, but if your container is not list, then you have to change this parameter to the type of the container you are using, if it is either a vector or deque, and also the type of the element inside the container if necessary.
{
byte buffer[sizeof(i)];
memset(buffer, 0, sizeof(i));
memcpy(buffer, &i, sizeof(i));
return *buffer == 0; //I found that the size of any iterator is 12 bytes long. I also found that if the first byte of the iterator that I copy to the buffer is zero, then the iterator is invalid. Otherwise it is valid. I like to call invalid iterators also as "null iterators".
}
I have already tested this function before I posted it there and found that this function is working for me.
I very hope that I have fully answered your question and also helped you very much!

There is a way, but is ugly... you can use the std::distance function
#include <algorithms>
using namespace std
auto distance_to_iter = distance(container.begin(), your_iter);
auto distance_to_end = distance(container.begin(),container.end());
bool is_your_iter_still_valid = distance_to_iter != distance_to_end;

use erase with increment :
if (something) l.erase(itd++);
so you can test the validity of the iterator.

if (iterator != container.end()) {
iterator is dereferencable !
}
If your iterator doesnt equal container.end(), and is not dereferencable, youre doing something wrong.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to define a C++ iterator that skips tombstones - c++

Related

What is the advantage of using (it != vector.end()) instead of (it < vector.end()) in for loops? [duplicate]

C++ STL - Why treat a function that returns an iterator as a void function?

Why is "!=" used with iterators instead of "<"?

C++ vector insights

Checking if an iterator is valid

Categories

Resources