How to create a past-the-end iterator? - c++

I have created a binary tree structure to store a bounded volume hierarchy, to make it easier to use (and safer) I created two iterators to complement it: breadth-first and depth-first.
The breadth-first iterator is essentially a wrapper for the underlying QList. But I am stuck on the depth-first iterator (bidirectional only), I can handle the actual iteration around the tree, I just do not how to create a past-the-end iterator.
I can't just use the QList::end() because there is no guarantee the lowest-level rightmost node is also the rightmost node of the whole tree. I'm reluctant to make a 'fake' BVH node that can tested for because it will involve a large code change (and probably overhead) to have the various node management mechanisms ignore the fake node, and disable a lot of the tree building automation (for example the parent of the fake node will have to be told it is a leaf). But if this is the only way - then it is the only way.

Having looked briefly at qlist.h, it appears that you won't be able to use the same end() for both iteration types. But that's OK--you can use a null pointer or a static dummy or other techniques to make an end() iterator for your second iteration method. I don't see why this would have to impact a huge amount of other code (most of which should just refer to end() without knowing its implementation details).

Can you not just use null or something like this as end? This is at least what I would expect e.g. for a linked list structure.

Related

Designing a constant time begin() inorder iterator function to an AVL container

Good day,
If one wants to design a standard-compliant container based on a (AVL) tree, its iterator functions have to be constant time. As noted in that discussion, the look-up in a search tree tends to be logarithmic.
Suppose a traditional tree class setup where as private members we have the root node and the size. Node is also designed conventionally (having three pointers, left-right-parent), and the iterators are lazy and store the node pointers. Then to provide a begin() one has to go from root to far left which takes O(logn) time in worst-case scenario. So my questions are:
What do you think is the best way to satisfy the complexity requirement for tree's begin()?
I was thinking of keeping another private node pointer member which will get updated after every modification but it seems to me there's a room for more optimal solution.
For end(), can I just return a nullptr?

Check for end-of-list in boost::intrusive::list without container?

I'm getting started with Boost.Intrusive, specifically interested in the doubly-linked list (boost::intrusive::list).
This would be trivial to do in a "hand-rolled" linked list, but so far I can't find a Boost equivalent:
Given a node that belongs to a list, how do I check to see if it represents the end of the list, without needing the owning container.
In a hand-made list, this would be as simple as checking if the "next" pointer is NULL.
With boost::intrusive::list, there is the s_iterator_to function, which converts a plain node to an iterator. And you can check that against mylist.end(), which gives the desired result, but it requires a reference to the list container itself.
I also note that using operator++ on such an iterator simply produces a garbage value once it is moved past the end — no error or assert from Boost.
After some more research and thought, it seems that there is no way to do what I want with the standard boost::intrusive::list functionality.
The list provided is, in fact, a circular linked list, not a linear one. So, there is no "null pointer" at the end.
The implementation seems to follow a similar design to the Linux kernel's list.h. You always need a reference to the container object because that contains the "head" of the circular list, which is a special node containing no user data. This is also the node that represents end() during traversal.
As to why this design is chosen, I haven't found any hard evidence. Seemingly, the circular list design allows a simpler implementation, with fewer branches. See, for example, this old article, which says "The circular nature of the list makes inserting and removing nodes simple and branch free."
I am not fully convinced by that, since I think using "pointer-to-pointer" style handling can avoid the branches, too. But that's how it's done in boost::intrusive::list, regardless.

std::map without parent pointers?

libstdc++, as an example, implements std::map using a red-black binary tree with parent pointers in the nodes. This means that iterators can just be pointers to a node.
Is it possible for a standard library to implement std::map without storing parent pointers in the nodes? I think this would mean that iterators would need to contain a stack of parent pointers, and as such would need to dynamically allocate a logarithmic amount of memory. Would this violate standard performance constraints on iterators? Would not having parent pointers violate any other performance contraints on the rest of the interface?
What about the new node stuff/interface in C++17?
They may not do so. std::map guarantees that removing a key-value pair from it won't invalidate any iterators other than to the pair being removed.
If iterators will store a stack of parents, and a parent is removed, that will invalidate those iterators as well. And the guarantee will no longer hold.
Is it possible? Possibly :-) Is it a good idea? Almost certainly not. Most things are possible, if you throw more storage or speed at them :-)
In terms of just getting rid of the parent pointers, you could, for example, maintain within the map a monotonic value that is incremented each time the map structure is changed. In essence, it's a version identifier of the map structure. So, adding or deleting elements in the map increments this value, while merely changing the data within the map does not.
The iterator would then contain:
a pointer to the map itself (to get the current version);
the stack of pointers; and
the version matching the last time the stack above was created.
The idea would basically be to, before doing anything with the iterator, detect when the map version is different to the iterator one and, if it is, rebuild the stack and update the iterator version before carrying on with whatever operation you're trying to perform.
Now, while that makes it possible to iterate without parent pointers, it unfortunately violates some other requirements of iterators, such as being able to action them in constant time. Anything that has to rebuild a data structure, based on the data within the map, will violate that restriction.
In any case, there's no way anyone in their right mind would implement such a horrid scheme when it's far simpler to have parent pointers, but the intent here is simply to show that it's possible.
Hence my advice would be to just stick with the parent pointers. The use of such parent pointers makes the process of finding the next/previous element a rather simple one, based only the current item in the iterator.

Can I create an iterator by copying the structure to a list and returning iterator of the list?

If I have a somewhat complex structure (such as hash table with chaining) and I want to create a custom iterator for the structure, is it valid to copy the contents of the complex structure into some sort of simple structure (such as a list) and then return the implicit iterator over the simple structure?
I realize it would take extra memory but are there any other reasons why I shouldn't just do that as opposed to creating my own iterator from a scratch?
Ultimately, yes you can do this if you don't need to edit elements in the original collection via your iterator.
You identify the memory issue; are there other reasons you shouldn't do this? There's the time taken to create the list. You'd either need to recreate this list copy every time you want to iterate or you'd have to make sure you keep the list up-to-date if the original collection can change.
That cost is particularly unfortunate if you wanted to use your iterator to do something like find the first element that meets some rule. If the first element meets the rule but there are a large number of elements then you end up doing a lot of copying in order to eventually only iterate up to the first element.
You can however write your own iterator to do the same job as your nested loops. Its hard to give a decent code example without knowing the structure you're trying to iterate, but in general you're likely to implement it using a class that holds an iterator of elements within a subcollection this is advanced until that current subcollection has been fully iterated and then moves on to the start next subcollection. So your iterator also has an iterator of the collections i.e. 2 iterators - one returns an element and one returns a (sub)collection.

Single linked lists & time complexity

I'm trying to write my own (as close to standard as possible) single linked list implementation. However I am wondering what time complexity people expect of such a list?
Especially for inserting I am wondering how I should implement it. I've read some locations around the internet, where some say inserting is O(1) while others say O(n) - all agree that a double linked list is O(1). However I think O(1) is the case for single linked lists too?
As long as you know the preceding node you just let the preceding node point to the new new, and the new node will point towards where the preceding node did originally point to.
That said it makes me wonder how people expect insert to behave? Normally it inserts elements BEFORE the given iterator. However with a single-linked-list it is hard to do so (one would have to go through O(n) time to get the preceding element & then use above method). Is it common in such lists to make insert place items behind the current iterator? Or -probably better- is there another common function for this?
The complexity of the insertion depends on what you need to do. If you know preceding node (actually, a suitable handle to change the preceding node's "next pointer" is all you need), the complexity is O(1). If you need to find the location where to insert, the complexity is O(n).
With respect to the expectation, I would expect the insert() to behave the same as for doubly linked lists but I also realize that you can't achieve this: You either need to have a different time complexity (to find the predecessor node) or different iterator invalidation semantics (i.e., iterators to other nodes get invalidated). I think the C++ 2011 std::forward_list class template went for a different interface but retaining the guarantees on iterator validity.
To briefly explain why the iterator validity can be effected: An iterator doesn't have to only know about the current node. Instead, it could, for example, point to the predecessor's next pointer. When dereferencing the iterator, it would dereference first its pointer to the next pointer and then this pointer to get hold of the actual node. In return, it is possible to insert in front of the iterator because the iterator knows which next pointer to update. Unfortunately, this means that iterators may get invalidated because the pointer they point to may have changed and they would reference a different node (when erasing nodes, the iterator may have been moved to be entirely invalid although the node referenced is still there).