How to iterate over a non-linear container - c++

Consider a hierarchical tree structure, where an item may have sibling items (at the same level in the hierarhcy) and may also have children items (one level down in hierarchy).
Lets say the structure can be defined like:
// an item of a hierarchical data structure
struct Item {
int data; // keep it an int, rather than <T>, for simplicity
vector<Item> children;
};
I wanted to be able to use algorithms over this structure, like the algorithms for a std::map, std::vector, etc. So, I created a few algorithms, like:
template <class Function>
Function for_each_children_of_item( Item, Function f ); // deep (recursive) traversal
template <class Function>
Function for_each_direct_children_of_item( Item, Function f ); // shallow (1st level) traversal
template <class Function>
Function for_each_parent_of_item( Item, Function f ); // going up to the root item
One thing that troubled me is that there are 3 for_each() functions for the same structure. But they give a good description of how they iterate, so I decided to live with it.
Then, soon, the need for more algorithms emerged (like find_if, count_if, any_of, etc), which made me feel I'm not on the right track, design-wise.
One solution I can think of, that would reduce the workload, would be to simply write:
vector<Item> get_all_children_of_item( Item ); // recursive
vector<Item> get_all_direct_children_of_item( Item ); // 1st level items
vector<Item> get_all_parents_of_item( Item ); // up to the root item
and then I could use all the STL algorithms.
I am a bit wary of this solution, because it involves copying.
I cannot think of a way to implement an iterator, as there is no obvious end() iterator in the recursive version of the traversal.
Can anybody present a typical / idiomatic way to deal with such non-linear data structures ?
Can/should iterators be created for such a structure? how?

Use iterators.
I cannot think of a way to implement an iterator, as there is no obvious end() iterator in the recursive version of the traversal.
end() can be any designated special value for your iterator class as long as your increment operator produces it when stepping past the last element. And/or override operator ==/!= for your iterator.
If you want to be really robust, implement an iterator mode for each of the XPath axes.

Related

std::list, move item in list using iterators only

It seems to me given what I know about linked lists that this should be possible but I haven't found anywhere that has the answer so I'm asking here.
Given two iterators to items in the same list. I'd like to take the item pointed to by iterator "frm" and "insert" it into the list before the item pointed to by iterator "to".
It seems that all that is needed is to change the pointers on the items in the list pointing to "frm" (to remove "frm"), then changing the pointers on the item pointing at "to" so that it references "frm" then changing the pointers on "frm" node to point to "to".
I looked everywhere for this and couldn't find an answer.
NOTE that I cannot use splice as I do not have access to the list only the iterators to the items in the list.
template <typename T>
void move(typename std::list<T>::iterator frm, typename std::list<T>::iterator to) {
//remove the item from the list at frm
//insert the item at frm before the item at to
}
Iterators contain the minimal information required to point to a piece of data, what you are missing is the fact that linked lists have other bookkeeping that go along with it as well, so essentially the list class looks something like the following
template <typename Type>
class list {
int size; // for O(1) size()
Type* head;
Type* tail;
class Iterator {
Type* element;
// no back pointer to list<Type>*
};
...
};
And to remove an element from the list you would need to update those data members as well. And to do that an iterator must contain a back pointer to the list itself, which is not required as per the interface offered for most iterators. Notice also that the algorithms in the STL do not actually modify the bookkeeping for the containers the operate on, only maybe swap elements, and rearrange things.
I would encourage you took look into the <algorithm> header, as well as into facilities like std::back_inserter and std::move_iterator to get an idea of how iterators are wrapped to actually modify the container they represent.
The implementation of this is implementation defined but the c++ standard allows the use of iter_swap though it doesn't do this exactly. This maybe optimized to swap the pointers on the values held in the linked list similar to what I have described effectively reordering the items in the list without a full swap needed.
iter_swap() versus swap() -- what's the difference?

Ordered Tree in C++

The C++ STL apparently is missing an ordered tree data structure. See here. Boost is also missing an ordered tree, but it does have an "un"ordered one, Property Tree where the data is ordered by insertion. I want the order to be irrespective of memory.
The boost page on Property Trees says that this is conceptually the boost::ptree structure.
struct ptree
{
data_type data; // data associated with the node
list< pair<key_type, ptree> > children; // ordered list of named children by insertion
};
I want to extend boost to keep track of order.
Is this the correct way?
class ordered_ptree : public boost::property_tree::ptree {
public:
ordered_ptree(int id) : _id{id}{};
protected:
int _id;
};
(From the comments in your question, I understand you want something like Python's OrderedDict but taking into account keys' relative order.)
Since none of the standard library's (or boost's) containers are exactly what you want, you might want to extend std::map (especially if you don't need all of the interface).
Say you start with
template<
typename Key,
typename Value,
class Compare=std::less<Key>,
class Alloc=std::allocator<pair<const Key, Value> >
class ordered_map
{
// This needs to be filled.
};
Now inside, you can hold an insertion counter:
std::size_t m_ins_count;
which is initialized to 0 and incremented at each insert.
Internally, your new keys will be std::pairs of the original key and the insertion count. Standard properties of binary search trees imply that nodes with keys differing only by the second pair item (which is the insertion count), will be consecutive in an in-order walk, which means that
you retain order of different keys
you retain order of insertion within a key
the operations are logarithmic time
traversing same-key items is (amortized) linear time
So, internally you'd have something like
typedef
std::map<
std::pair<Key, std::size_t>,
Value,
lex_compare<Compare>,
std::allocator<std::pair<std::pair<Key, std::size_t>, Value> >
internal_map_t;
(where lex_compare<Compare> compares first by the given functor, then by the insertion index).
Now you can choose a (minimal) interface, and implement it, by translating keys in the "outer world" and pairs of keys + insertion indices in the "inner world" of the tree.
If you plan on providing an iterator interface as well, you might find the boost iterator library useful, as you simply want to modify std::map's iterators.

C++: Scott Meyers "Effective STL": item 31: know your sorting options: help to understand

Good day!
In his "Effective STL" Scott Meyers wrote
A third is to use the information in an ordered container of iterators to iteratively splice the list's elements into the positions you'd like them to be in. As you can see, there are lots of options. (Item 31, second part)
Can someone explain me this way?
More text (to understand the context):
The algorithms sort, stable_sort, partial_sort, and nth_element require random access iterators, so they may be applied only to vectors, strings, deques, and arrays. It makes no sense to sort elements in standard associative containers, because such containers use their comparison functions to remain sorted at all times. The only container where we might like to use sort, stable_sort, partial_sort, or nth_element, but can't, is list, and list compensates somewhat by offering its sort member function. (Interestingly, list::sort performs a stable sort.) If you want to sort a list, then, you can, but if you want to use partial_sort, or nth_element on the objects in a list, you have to do it indirectly. One indirect approach is to copy the elements into a container with random access iterators, then apply the desired algorithm to that. Another is to create a container of list::iterators, use the algorithm on that container, then access the list elements via the iterators. A third is to use the information in an ordered container of iterators to iteratively splice the list's elements into the positions you'd like them to be in. As you can see, there are lots of options.
I'm not sure what the confusion is but I suspect that it is what "splicing" refers to: the std::list<T> has an splice() member function (well, actually several overloads) which transfer nodes between lists. That is, you create a std::vector<std::list<T>::const_iterator> and apply the sorting algorithm (e.g. std::partial_sort()) to this. Then you create a new std::list<T> and use the splice() member with the iterators from the sorted vector to put the nodes into their correct order without moving the objects themselves.
This would look something like this:
std::vector<std::list<T>::const_iterator> tmp;
for (auto it(list.begin()), end(list.end()); it != end; ++it) {
tmp.push_back(it);
}
some_sort_of(tmp);
std::list<T> result;
for (auto it(tmp.begin()), end(tmp.end()); it != end; ++it) {
result.splice(result.end(), list, it);
}
Let's say you wanted to do a partial_sort on a list. You could store the iterators to the list in a set, by providing a comparison function that can sort using the iterators, like this:
struct iterator_less
{
bool operator() (std::list<int>::iterator lhs,
std::list<int>::iterator rhs) const
{
return (*lhs < *rhs);
}
};
typedef std::multiset<
std::list<int>::iterator, iterator_less
> iterator_set;
The you could let set perform the sort, but since it contains iterators to list, you could you list::splice to splice them into a partial_sorted list:
std::list<int> unsorted, partialSorted;
unsorted.push_back(11);
unsorted.push_back(2);
unsorted.push_back(2);
unsorted.push_back(99);
unsorted.push_back(2);
unsorted.push_back(4);
unsorted.push_back(5);
unsorted.push_back(7);
unsorted.push_back(34);
// First copy the iterators into the set
iterator_set itSet;
for( auto it = unsorted.begin(); it!=unsorted.end();++it)
{
itSet.insert(it);
}
// now if you want a partial_sort with the first 3 elements, iterate through the
// set grabbing the first item in the set and then removing it.
int count = 3;
while(count--)
{
iterator_set::iterator setTop = itSet.begin();
partialSorted.splice(
partialSorted.begin(),
unsorted,
*setTop);
itSet.erase(setTop);
}
partialSorted.splice(
partialSorted.end(),
unsorted,
unsorted.begin(),
unsorted.end());
An ordered container would be either std::set or std::map. If you're willing to make a comparator that takes iterators you would use std::set<std::list<mydata>::iterator,comparator>, otherwise you could use std::map<mydata,std::list<mydata>::iterator>. You go through your list from begin() to end() and insert the iterators into the set or map; now you can use it to access the items in the list in sorted order by iterating the set or map, because it's automatically sorted.
Ordered containers are std::set and std::multiset. std::set implements a BST. So what it says is that you should crate an std::set<std::list::iterators> and then use the inherent BST structure to do the sorting. Here is a link on BST to get you started.
Edit Ah. Just noticed "ordered container of iterators". That would imply creating an index into another container.
Boost Multi Index has many example of such things (where a single collections is indexed by several different ordering predicates and the indices are nothing more than collections of 'pointers' (usually iterators) into the base container.
"A third is to use the information in an ordered container of iterators to iteratively splice the list's elements into the positions you'd like them to be in"
One thing I think would match that description is when doing std::sort_heap of a list/vector which has had std::make_heap/push_heap/pop_heap operating on it.
make_heap : convert a sequence to a heap
sort_heap : sort a heap
push_heap : insert an element in a heap
pop_heap : remove the top element from a heap
Heaps are organizations of elements within sequences, which make it (relatively) efficient to keep the collection in a known ordering under insert/removal. The order is implicit (like a recursive defined binary tree stored in a contiguous array) and can be transformed into the corresponding properly sorted sequence by doing the (highly efficient) sort_heap algorithm on it.

Hybrid vector/list container?

I'm in need of a container that has the properties of both a vector and a list. I need fast random access to elements within the container, but I also need to be able to remove elements in the middle of the container without moving the other elements. I also need to be able to iterate over all elements in the container, and see at a glance (without iteration) how many elements are in the container.
After some thought, I've figured out how I could create such a container, using a vector as the base container, and wrapping the actual stored data within a struct that also contained fields to record whether the element was valid, and pointers to the next/previous valid element in the vector. Combined with some overloading and such, it sounds like it should be fairly transparent and fulfill my requirements.
But before I actually work on creating yet another container, I'm curious if anyone knows of an existing library that implements this very thing? I'd rather use something that works than spend time debugging a custom implementation. I've looked through the Boost library (which I'm already using), but haven't found this in there.
If the order does not matter, I would just use a hash table mapping integers to pointers. std::tr1::unordered_map<int, T *> (or std::unordered_map<int, unique_ptr<T>> if C++0x is OK).
The hash table's elements can move around which is why you need to use a pointer, but it will support very fast insertion / lookup / deletion. Iteration is fast too, but the elements will come out in an indeterminate order.
Alternatively, I think you can implement your own idea as a very simple combination of a std::vector and a std::list. Just maintain both a list<T> my_list and a vector<list<T>::iterator> my_vector. To add an object, push it onto the back of my_list and then push its iterator onto my_vector. (Set an iterator to my_list.end() and decrement it to get the iterator for the last element.) To lookup, look up in the vector and just dereference the iterator. To delete, remove from the list (which you can do by iterator) and set the location in the vector to my_list.end().
std::list guarantees the elements within will not move when you delete them.
[update]
I am feeling motivated. First pass at an implementation:
#include <vector>
#include <list>
template <typename T>
class NairouList {
public:
typedef std::list<T> list_t;
typedef typename list_t::iterator iterator;
typedef std::vector<iterator> vector_t;
NairouList() : my_size(0)
{ }
void push_back(const T &elt) {
my_list.push_back(elt);
iterator i = my_list.end();
--i;
my_vector.push_back(i);
++my_size;
}
T &operator[](typename vector_t::size_type n) {
if (my_vector[n] == my_list.end())
throw "Dave's not here, man";
return *(my_vector[n]);
}
void remove(typename vector_t::size_type n) {
my_list.erase(my_vector[n]);
my_vector[n] = my_list.end();
--my_size;
}
size_t size() const {
return my_size;
}
iterator begin() {
return my_list.begin();
}
iterator end() {
return my_list.end();
}
private:
list_t my_list;
vector_t my_vector;
size_t my_size;
};
It is missing some Quality of Implementation touches... Like, you probably want more error checking (what if I delete the same element twice?) and maybe some const versions of operator[], begin(), end(). But it's a start.
That said, for "a few thousand" elements a map will likely serve at least as well. A good rule of thumb is "Never optimize anything until your profiler tells you to".
Looks like you might be wanting a std::deque. Removing an element is not as efficient as a std::list, but because deque's are typically created by using non-contiguous memory "blocks" that are managed via an additional pointer array/vector internal to the container (each "block" would be an array of N elements), removal of an element inside of a deque does not cause the same re-shuffling operation that you would see with a vector.
Edit: On second though, and after reviewing some of the comments, while I think a std::deque could work, I think a std::map or std::unordered_map will actually be better for you since it will allow the array-syntax indexing you want, yet give you fast removal of elements as well.

Item in multiple lists

So I have some legacy code which I would love to use more modern techniques. But I fear that given the way that things are designed, it is a non-option. The core issue is that often a node is in more than one list at a time. Something like this:
struct T {
T *next_1;
T *prev_1;
T *next_2;
T *prev_2;
int value;
};
this allows the core have a single object of type T be allocated and inserted into 2 doubly linked lists, nice and efficient.
Obviously I could just have 2 std::list<T*>'s and just insert the object into both...but there is one thing which would be way less efficient...removal.
Often the code needs to "destroy" an object of type T and this includes removing the element from all lists. This is nice because given a T* the code can remove that object from all lists it exists in. With something like a std::list I would need to search for the object to get an iterator, then remove that (I can't just pass around an iterator because it is in several lists).
Is there a nice c++-ish solution to this, or is the manually rolled way the best way? I have a feeling the manually rolled way is the answer, but I figured I'd ask.
As another possible solution, look at Boost Intrusive, which has an alternate list class a lot of properties that may make it useful for your problem.
In this case, I think it'd look something like this:
using namespace boost::intrusive;
struct tag1; struct tag2;
typedef list_base_hook< tag<tag1> > base1;
typedef list_base_hook< tag<tag2> > base2;
class T: public base1, public base2
{
int value;
}
list<T, base_hook<base1> > list1;
list<T, base_hook<base2> > list2;
// constant time to get iterator of a T item:
where_in_list1 = list1.iterator_to(item);
where_in_list2 = list2.iterator_to(item);
// once you have iterators, you can remove in contant time, etc, etc.
Instead of managing your own next/previous pointers, you could indeed use an std::list. To solve the performance of remove problem, you could store an iterator to the object itself (one member for each std::list the element can be stored in).
You can extend this to store a vector or array of iterators in the class (in case you don't know the number of lists the element is stored in).
I think the proper answer depends on how performance-critical this application is. Is it in an inner loop that could potentially cost the program a user-perceivable runtime difference?
There is a way to create this sort of functionality by creating your own classes derived from some of the STL containers, but it might not even be worth it to you. At the risk of sounding tiresome, I think this might be an example of premature optimization.
The question to answer is why this C struct exists in the first place. You can't re-implement the functionality in C++ until you know what that functionality is. Some questions to help you answer that are,
Why lists? Does the data need to be in sequence, i.e., in order? Does the order mean something? Does the application require ordered traversal?
Why two containers? Does membership in the container indicated some kind of property of the element?
Why a double-linked list specifically? Is O(1) insertion and deletion important? Is reverse-iteration important?
The answer to some or all of these may be, "no real reason, that's just how they implemented it". If so, you can replace that intrusive C-pointer mess with a non-intrusive C++ container solution, possibly containing shared_ptrs rather than ptrs.
What I'm getting at is, you may not need to re-implement anything. You may be able to discard the entire business, and store the values in proper C++ containers.
How's this?
struct T {
std::list<T*>::iterator entry1, entry2;
int value;
};
std::list<T*> list1, list2;
// init a T* item:
item = new T;
item->entry1 = list1.end();
item->entry2 = list2.end();
// add a T* item to list 1:
item->entry1 = list1.insert(<where>, item);
// remove a T* item from list1
if (item->entry1 != list1.end()) {
list1.remove(item->entry1); // this is O(1)
item->entry1 = list1.end();
}
// code for list2 management is similar
You could make T a class and use constructors and member functions to do most of this for you. If you have variable numbers of lists, you can use a list of iterators std::vector<std::list<T>::iterator> to track the item's position in each list.
Note that if you use push_back or push_front to add to the list, you need to do item->entry1 = list1.end(); item->entry1--; or item->entry1 = list1.begin(); respectively to get the iterator pointed in the right place.
It sounds like you're talking about something that could be addressed by applying graph theory. As such the Boost Graph Library might offer some solutions.
list::remove is what you're after. It'll remove any and all objects in the list with the same value as what you passed into it.
So:
list<T> listOne, listTwo;
// Things get added to the lists.
T thingToRemove;
listOne.remove(thingToRemove);
listTwo.remove(thingToRemove);
I'd also suggest converting your list node into a class; that way C++ will take care of memory for you.
class MyThing {
public:
int value;
// Any other values associated with T
};
list<MyClass> listOne, listTwo; // can add and remove MyClass objects w/o worrying about destroying anything.
You might even encapsulate the two lists into their own class, with add/remove methods for them. Then you only have to call one method when you want to remove an object.
class TwoLists {
private:
list<MyClass> listOne, listTwo;
// ...
public:
void remove(const MyClass& thing) {
listOne.remove(thing);
listTwo.remove(thing);
}
};