Ukkonen's suffix tree algorithm, what is necessary? - c++

Yes I have read this: Ukkonen's suffix tree algorithm in plain English?
It is a great explanation of the algorithm but it is not so much the algorithm itself that is killing me but rather the data structure used to implement it.
I need the data structure to be as minimal and as fast as possible and I have seen many implementations using only Nodes, some with only edges, some with edges and nodes, etc. Then there are variations, a website I was reading claimed that a node need not have a pointer to its parent, and other places don't account for how children of a node are managed.
My idea is to have a Node structure with int start, and int * end (points to the current end or phase i). Each node will have a suffix_link pointer, a pointer to its parent, and a pointer to a vector containing its child nodes.
My question is, are these things sufficient and necessary to implement a suffix tree? Can I minimize it in any way? I haven't seen an implementation with children in vectors yet so I am skeptical as to my own thinking. Could someone explain what one would need to implement a suffix tree in this manner using only nodes?

Following may be helpful:
Ukkonen’s Suffix Tree Construction
Here we have
1. start, end to represent edge label
2. suffix link
3. an array for children

When i have to implement that algorithm the better explained document was the original Ukkonen paper and there's one newer with images.
Yes in this documents are all the inside to implement Ukkonen's Suffix Tree algorithm.

Related

Traverse through a DAG-like structure to produce another DAG-like structure in Clojure

I have a DAG-like structure that is essentially a deeply-nested map. The maps in this structure can have common values, so the overall structure is not a tree but a direct acyclic graph. I'll refer to this structure as a DAG for brevity.
The nodes in this graph are of different but finite number of categories. Each category can have its own structure/keywords/number-of-children. There is one unique node that is the source of this DAG, meaning from this node we can reach all nodes in the DAG.
The task is to traverse through the DAG from the source node, and convert each node to another one or more nodes in a new constructed graph. I'll give an example for illustration.
The graph in the upper half is the input one. The lower half is the one after transformation. For simplicity, the transformation is only done on node A where it is split into node 1 and A1. The children of node A are also reallocated.
What I have tried (or in mind):
Write a function to convert one object for different types. Inside this function, recursively call itself to convert each of its children. This method suffers from the problem that data are immutable. The nodes in the transformed graph cannot be changed randomly to add children. To overcome this, I need to wrap every node in a ref/atom/agent.
Do a topological sort on the original graph. Then convert the nodes in the reversed order, i.e., bottom-up. This method requires a extra traverse of the graph but at least the data need not to be mutable. Regarding the topological sort algorithm, I'm considering DFS-based method as stated in the wiki page, which does not require the knowledge of the full graph nor a node's parents.
My question is:
Is there any other approaches you might consider, possibly more elegant/efficient/idiomatic?
I'm more in favour of the second method, is there any flaws or potential problems?
Thanks!
EDIT: On a second thought, a topological sorting is not necessary. The transformation can be done in the post-order traversal already.
This looks like a perfect application of Zippers. They have all the capabilities you described as needed and can produce the edited 'new' DAG. There are also a number of libraries that ease the search and replace capability using predicate threads.
I've used zippers when working with OWL ontologies defined in nested vector or map trees.
Another option would be to take a look at Walkers although I've found these a bit more tedious to use.

Is there any MFC / STL class that represents Binary Tree [duplicate]

Why does the C++ STL not provide any "tree" containers, and what's the best thing to use instead?
I want to store a hierarchy of objects as a tree, rather than use a tree as a performance enhancement...
There are two reasons you could want to use a tree:
You want to mirror the problem using a tree-like structure:
For this we have boost graph library
Or you want a container that has tree like access characteristics
For this we have
std::map (and std::multimap)
std::set (and std::multiset)
Basically the characteristics of these two containers is such that they practically have to be implemented using trees (though this is not actually a requirement).
See also this question:
C tree Implementation
Probably for the same reason that there is no tree container in boost. There are many ways to implement such a container, and there is no good way to satisfy everyone who would use it.
Some issues to consider:
Are the number of children for a node fixed or variable?
How much overhead per node? - ie, do you need parent pointers, sibling pointers, etc.
What algorithms to provide? - different iterators, search algorithms, etc.
In the end, the problem ends up being that a tree container that would be useful enough to everyone, would be too heavyweight to satisfy most of the people using it. If you are looking for something powerful, Boost Graph Library is essentially a superset of what a tree library could be used for.
Here are some other generic tree implementations:
Kasper Peeters' tree.hh
Adobe's forest
core::tree
"I want to store a hierarchy of objects as a tree"
C++11 has come and gone and they still didn't see a need to provide a std::tree, although the idea did come up (see here). Maybe the reason they haven't added this is that it is trivially easy to build your own on top of the existing containers. For example...
template< typename T >
struct tree_node
{
T t;
std::vector<tree_node> children;
};
A simple traversal would use recursion...
template< typename T >
void tree_node<T>::walk_depth_first() const
{
cout<<t;
for ( auto & n: children ) n.walk_depth_first();
}
If you want to maintain a hierarchy and you want it to work with STL algorithms, then things may get complicated. You could build your own iterators and achieve some compatibility, however many of the algorithms simply don't make any sense for a hierarchy (anything that changes the order of a range, for example). Even defining a range within a hierarchy could be a messy business.
The STL's philosophy is that you choose a container based on guarantees and not based on how the container is implemented. For example, your choice of container may be based on a need for fast lookups. For all you care, the container may be implemented as a unidirectional list -- as long as searching is very fast you'd be happy. That's because you're not touching the internals anyhow, you're using iterators or member functions for the access. Your code is not bound to how the container is implemented but to how fast it is, or whether it has a fixed and defined ordering, or whether it is efficient on space, and so on.
If you are looking for a RB-tree implementation, then stl_tree.h might be appropriate for you too.
the std::map is based on a red black tree. You can also use other containers to help you implement your own types of trees.
The problem is that there is no one-size-fits-all solution. Moreover, there is not even a one-size-fits-all interface for a tree. That is, it is not even clear which methods such a tree data structure should provide and it is not even clear what a tree is.
This explains why there is no STL support on this: The STL is for data structures that most people need, where basically everyone agrees on what a sensible interface and an efficient implementation is. For trees, such a thing just doesn't exist.
The gory details
If want to understand further what the problem is, read on. Otherwise, the paragraph above already should be sufficent to answer your question.
I said that there is not even a common interface. You might disagree, since you have one application in mind, but if you think further about it, you will see that there are countless possible operations on trees. You can either have a data structure that enables most of them efficiently, but therefore is more complex overall and has overhead for that complexity, or you have more simple data structure that only allows basic operations but these as quick as possible.
If you want the complete story, check out my paper on the topic. There you will find possible interface, asymptotic complexities on different implementations, and a general description of the problem and also related work with more possible implementations.
What is a tree?
It already starts with what you consider to be a tree:
Rooted or unrooted: most programmers want rooted, most mathematicians want unrooted. (If you wonder what unrooted is: A - B - C is a tree where either A, B, or C could be the root. A rooted tree defines which one is. An unrooted doesn't)
Single root/connected or multi root/disconnected (tree or forest)
Is sibling order relevant? If no, then can the tree structure internally reorder children on updates? If so, iteration order among siblings is no longer defined. But for most trees, sibiling order is actually not meaningful, and allowing the data structure to reorder children on update is very beneficial for some updates.
Really just a tree, or also allow DAG edges (sounds weird, but many people who initially want a tree eventually want a DAG)
Labeled or unlabled? Do you need to store any data per node, or is it only the tree structure you're interested in (the latter can be stored very succinctly)
Query operations
After we have figured out what we define to be a tree, we should define query operations: Basic operations might be "navigate to children, navigate to parent", but there are way more possible operations, e.g.:
Navigate to next/prev sibling: Even most people would consider this a pretty basic operation, it is actually almost impossible if you only have a parent pointer or a children array. So this already shows you that you might need a totally different implementation based on what operations you need.
Navigate in pre/post order
Subtree size: the number of (transitive) descendants of the current node (possibly in O(1) or O(log n), i.e., don't just enumerate them all to count)
the height of the tree in the current node. That is, the longest path from this node to any leave node. Again, in less than O(n).
Given two nodes, find the least common ancestor of the node (with O(1) memory consumption)
How many nodes are between node A and node B in a pre-/post-order traversal? (less than O(n) runtime)
I emphasized that the interesting thing here is whether these methods can be performed better than O(n), because just enumerating the whole tree is always an option. Depending on your application, it might be absolutely crucial that some operations are faster than O(n), or you might not care at all. Again, you will need vastely different data structures depending on your needs here.
Update operations
Until now, I only talked about query opertions. But now to updates. Again, there are various ways in which a tree could be updated. Depending on which you need, you need a more or less sophisticated data structure:
leaf updates (easy): Delete or add a leaf node
inner node updates (harder): Move or delete move an inner node, making its children the children
of its parent
subtree updates (harder): Move or delete a subtree rooted in a node
To just give you some intuition: If you store a child array and your sibling order is important, even deleting a leaf can be O(n) as all siblings behind it have to be shifted in the child array of its parent. If you instead only have a parent pointer, leaf deletion is trivially O(1). If you don't care about sibiling order, it is also O(1) for the child array, as you can simply replace the gap with the last sibling in the array. This is just one example where different data structures will give you quite different update capabilities.
Moving a whole subtree is again trivially O(1) in case of a parent pointer, but can be O(n) if you have a data structure storing all nodes e.g., in pre-order.
Then, there are orthogonal considerations like which iterators stay valid if you perform updates. Some data structures need to invalidate all iterators in the whole tree, even if you insert a new leaf. Others only invalidate iterators in the part of the tree that is altered. Others keep all iterators (except the ones for deleted nodes) valid.
Space considerations
Tree structures can be very succinct. Roughly two bits per node are enough if you need to save on space (e.g., DFUDS or LOUDS, see this explanation to get the gist). But of course, naively, even a parent pointer is already 64 bits. Once you opt for a nicely-navigable structure, you might rather require 20 bytes per node.
With a lot of sophisication, one can also build a data structure that only takes some bits per entry, can be updated efficiently, and still enables all query operations asymptotically fast, but this is a beast of a structure that is highly complex. I once gave a practical course where I had grad students implement this paper. Some of them were able to implement it in 6 weeks (!), others failed. And while the structure has great asymptotics, its complexity makes it have quite some overhead for very simple operations.
Again, no one-size-fits-all.
Conclusion
I worked 5 years on finding the best data structure to represent a tree, and even though I came up with some and there is quite some related work, my conclusion was that there is not one. Depending on the use case, a highly sophsticated data struture will be outperformed by a simple parent pointer. Even defining the interface for a tree is hard. I tried defining one in my paper, but I have to acknowledge that there are various use cases where the interface I defined is too narrow or too large. So I doubt that this will ever end up in STL, as there are just too many tuning knobs.
In a way, std::map is a tree (it is required to have the same performance characteristics as a balanced binary tree) but it doesn't expose other tree functionality. The likely reasoning behind not including a real tree data structure was probably just a matter of not including everything in the stl. The stl can be looked as a framework to use in implementing your own algorithms and data structures.
In general, if there's a basic library functionality that you want, that's not in the stl, the fix is to look at BOOST.
Otherwise, there's a bunch of libraries out there, depending on the needs of your tree.
All STL container are externally represented as "sequences" with one iteration mechanism.
Trees don't follow this idiom.
I think there are several reasons why there are no STL trees. Primarily Trees are a form of recursive data structure which, like a container (list, vector, set), has very different fine structure which makes the correct choices tricky. They are also very easy to construct in basic form using the STL.
A finite rooted tree can be thought of as a container which has a value or payload, say an instance of a class A and, a possibly empty collection of rooted (sub) trees; trees with empty collection of subtrees are thought of as leaves.
template<class A>
struct unordered_tree : std::set<unordered_tree>, A
{};
template<class A>
struct b_tree : std::vector<b_tree>, A
{};
template<class A>
struct planar_tree : std::list<planar_tree>, A
{};
One has to think a little about iterator design etc. and which product and co-product operations one allows to define and be efficient between trees - and the original STL has to be well written - so that the empty set, vector or list container is really empty of any payload in the default case.
Trees play an essential role in many mathematical structures (see the classical papers of Butcher, Grossman and Larsen; also the papers of Connes and Kriemer for examples of they can be joined, and how they are used to enumerate). It is not correct to think their role is simply to facilitate certain other operations. Rather they facilitate those tasks because of their fundamental role as a data structure.
However, in addition to trees there are also "co-trees"; the trees above all have the property that if you delete the root you delete everything.
Consider iterators on the tree, probably they would be realised as a simple stack of iterators, to a node, and to its parent, ... up to the root.
template<class TREE>
struct node_iterator : std::stack<TREE::iterator>{
operator*() {return *back();}
...};
However, you can have as many as you like; collectively they form a "tree" but where all the arrows flow in the direction toward the root, this co-tree can be iterated through iterators towards the trivial iterator and root; however it cannot be navigated across or down (the other iterators are not known to it) nor can the ensemble of iterators be deleted except by keeping track of all the instances.
Trees are incredibly useful, they have a lot of structure, this makes it a serious challenge to get the definitively correct approach. In my view this is why they are not implemented in the STL. Moreover, in the past, I have seen people get religious and find the idea of a type of container containing instances of its own type challenging - but they have to face it - that is what a tree type represents - it is a node containing a possibly empty collection of (smaller) trees. The current language permits it without challenge providing the default constructor for container<B> does not allocate space on the heap (or anywhere else) for an B, etc.
I for one would be pleased if this did, in a good form, find its way into the standard.
Because the STL is not an "everything" library. It contains, essentially, the minimum structures needed to build things.
This one looks promising and seems to be what you're looking for:
http://tree.phi-sci.com/
IMO, an omission. But I think there is good reason not to include a Tree structure in the STL. There is a lot of logic in maintaining a tree, which is best written as member functions into the base TreeNode object. When TreeNode is wrapped up in an STL header, it just gets messier.
For example:
template <typename T>
struct TreeNode
{
T* DATA ; // data of type T to be stored at this TreeNode
vector< TreeNode<T>* > children ;
// insertion logic for if an insert is asked of me.
// may append to children, or may pass off to one of the child nodes
void insert( T* newData ) ;
} ;
template <typename T>
struct Tree
{
TreeNode<T>* root;
// TREE LEVEL functions
void clear() { delete root ; root=0; }
void insert( T* data ) { if(root)root->insert(data); }
} ;
Reading through the answers here the common named reasons are that one cannot iterate through the tree or that the tree does not assume the similar interface to other STL containers and one could not use STL algorithms with such tree structure.
Having that in mind I tried to design my own tree data structure which will provide STL-like interface and will be usable with existing STL algorthims as much as possible.
My idea was that the tree must be based on the existing STL containers and that it must not hide the container, so that it will be accessible to use with STL algorithms.
The other important feature the tree must provide is the traversing iterators.
Here is what I was able to come up with: https://github.com/cppfw/utki/blob/master/src/utki/tree.hpp
And here are the tests: https://github.com/cppfw/utki/blob/master/tests/unit/src/tree.cpp
All STL containers can be used with iterators. You can't have an iterator an a tree, because you don't have ''one right'' way do go through the tree.

MinMax Heap implementation without an array

I found lots of MinMax Heap implementations, that were storing data in an array. It is realy easy to implement, that is way I am looking for something different. I want to create a MinMax Heap using only elements of the Heap with pointers to left child and right child (and afcourse a key to compare). So the Heap have only pointer to the root object (min level), and a root object have a pointer to his children (max level) and so on. I know how to insert a new object (finding a proper path by using binary represenation of int depending on Heap size), but I don't know how to implement the rest (push up (down) the element, find parent or grandparent).
Thx for help
A priority queue using a heap ordered binary tree can be implemented using a triply linked list structure instead of an array. you will need three links per node:two to traverse down and one to traverse up.
The heapq module source code shows to implement the steps for pushing up and down. To switch from an array implementation to a pointer implementation, replace the arr[2*n+1] computation with node.left and arr[2*n+2] with node.right. For parent references such as arr[(n-1)>>1], every node will need a pointer to its parent, node.parent.
Alternatively, you can adopt a functional style which makes this all very easy to implement. I found the code for treaps implemented in Lisp to be an inspiration for how to do this.
I have solved this problem as part of an assignment long back. You can find it here
I have multiple implementations in Java and C++ implementing MinHeap with and without arrays. See my Java implementations for the solution. And yes it is very much possible to implement Heap without arrays. You just have to remember where to insert the next node and how to heapify and reverse heapify.
Edit1: I also tried to look up any existing solutions for min heap without arrays but couldn't find any. So, I am posting it here so it could be helpful for anyone who wishes to know the approach.
Yes, you can implement it without relying on an array.
I personally relied on a binary counter...
Here is my implementation(https://github.com/mohamedadnane8/HeapsUsingPointers) in c.
Note that this is still a very fast implementation with log(n).
1 => binary "1"
2=> "10" 3=> "11"
4=> "100" 5= "101" 6="110" 7="111"
In this program i tried to use the sequence of numbers to insert and delete as u can see above the tree can be easily represented as binary strings of numbers.
The first '1' in the binary string is to start.
After that the sequence of 0 and 1 determines where to go '1' means go to the left and '0' go to the right.
Also, note that this implementation relies on a very small array of characters or integers that make the calculation of the binary numbers faster but u can rely on bin() function to convert ur counter to a binary number(I implemented the array just to practice a bit my problem-solving skills).
Sorry if I couldn't explain it very well, I lack a bit in my communication skills.
It is hard implement binary heap without array. Because you should keep all the parent while inserting you pass and then do operation push up and down. like that [parent_1, parent_2 ... parant_k] and then if parent_(k+1) < parant_k pushUp and rearrange their right child and left child

Efficient Huffman tree search while remembering path taken

As a follow up question related to my question regarding efficient way of storing huffman tree's I was wondering what would be the fastest and most efficient way of searching a binary tree (based on the Huffman coding output) and storing the path taken to a particular node.
This is what I currently have:
Add root node to queue
while queue is not empty, pop item off queue
check if it is what we are looking
yes:
Follow a head pointer back to the root node, while on each node we visit checking whether it is the left or right and making a note of it.
break out of the search
enqueue left, and right node
Since this is a Huffman tree, all of the entries that I am looking for will exist. The above is a breadth first search, which is considered the best for Huffman trees since items that are in the source more often are higher up in the tree to get better compression, however I can't figure out a good way to keep track of how we got to a particular node without backtracking using the head pointer I put in the node.
In this case, I am also getting all of the right/left paths in reverse order, for example, if we follow the head to the root, and we find out that from the root it is right, left, left, we get left, left, right. or 001 in binary, when what I am looking for is to get 100 in an efficient way.
Storing the path from root to the node as a separate value inside the node was also suggested, however this would break down if we ever had a tree that was larger than however many bits the variable we created for that purpose could hold, and at that point storing the data would also take up huge amounts of memory.
Create a dictionary of value -> bit-string, that would give you the fastest lookup.
If the values are a known size, you can probably get by with just an array of bit-strings and look up the values by their index.
If you're decoding Huffman-encoded data one bit at a time, your performance will be poor. As much as you'd like to avoid using lookup tables, that's the only way to go if you care about performance. The way Huffman codes are created, they are left-to-right unique and lend themselves perfectly to a fast table lookup.

Why does the C++ STL not provide any "tree" containers?

Why does the C++ STL not provide any "tree" containers, and what's the best thing to use instead?
I want to store a hierarchy of objects as a tree, rather than use a tree as a performance enhancement...
There are two reasons you could want to use a tree:
You want to mirror the problem using a tree-like structure:
For this we have boost graph library
Or you want a container that has tree like access characteristics
For this we have
std::map (and std::multimap)
std::set (and std::multiset)
Basically the characteristics of these two containers is such that they practically have to be implemented using trees (though this is not actually a requirement).
See also this question:
C tree Implementation
Probably for the same reason that there is no tree container in boost. There are many ways to implement such a container, and there is no good way to satisfy everyone who would use it.
Some issues to consider:
Are the number of children for a node fixed or variable?
How much overhead per node? - ie, do you need parent pointers, sibling pointers, etc.
What algorithms to provide? - different iterators, search algorithms, etc.
In the end, the problem ends up being that a tree container that would be useful enough to everyone, would be too heavyweight to satisfy most of the people using it. If you are looking for something powerful, Boost Graph Library is essentially a superset of what a tree library could be used for.
Here are some other generic tree implementations:
Kasper Peeters' tree.hh
Adobe's forest
core::tree
"I want to store a hierarchy of objects as a tree"
C++11 has come and gone and they still didn't see a need to provide a std::tree, although the idea did come up (see here). Maybe the reason they haven't added this is that it is trivially easy to build your own on top of the existing containers. For example...
template< typename T >
struct tree_node
{
T t;
std::vector<tree_node> children;
};
A simple traversal would use recursion...
template< typename T >
void tree_node<T>::walk_depth_first() const
{
cout<<t;
for ( auto & n: children ) n.walk_depth_first();
}
If you want to maintain a hierarchy and you want it to work with STL algorithms, then things may get complicated. You could build your own iterators and achieve some compatibility, however many of the algorithms simply don't make any sense for a hierarchy (anything that changes the order of a range, for example). Even defining a range within a hierarchy could be a messy business.
The STL's philosophy is that you choose a container based on guarantees and not based on how the container is implemented. For example, your choice of container may be based on a need for fast lookups. For all you care, the container may be implemented as a unidirectional list -- as long as searching is very fast you'd be happy. That's because you're not touching the internals anyhow, you're using iterators or member functions for the access. Your code is not bound to how the container is implemented but to how fast it is, or whether it has a fixed and defined ordering, or whether it is efficient on space, and so on.
If you are looking for a RB-tree implementation, then stl_tree.h might be appropriate for you too.
the std::map is based on a red black tree. You can also use other containers to help you implement your own types of trees.
The problem is that there is no one-size-fits-all solution. Moreover, there is not even a one-size-fits-all interface for a tree. That is, it is not even clear which methods such a tree data structure should provide and it is not even clear what a tree is.
This explains why there is no STL support on this: The STL is for data structures that most people need, where basically everyone agrees on what a sensible interface and an efficient implementation is. For trees, such a thing just doesn't exist.
The gory details
If want to understand further what the problem is, read on. Otherwise, the paragraph above already should be sufficent to answer your question.
I said that there is not even a common interface. You might disagree, since you have one application in mind, but if you think further about it, you will see that there are countless possible operations on trees. You can either have a data structure that enables most of them efficiently, but therefore is more complex overall and has overhead for that complexity, or you have more simple data structure that only allows basic operations but these as quick as possible.
If you want the complete story, check out my paper on the topic. There you will find possible interface, asymptotic complexities on different implementations, and a general description of the problem and also related work with more possible implementations.
What is a tree?
It already starts with what you consider to be a tree:
Rooted or unrooted: most programmers want rooted, most mathematicians want unrooted. (If you wonder what unrooted is: A - B - C is a tree where either A, B, or C could be the root. A rooted tree defines which one is. An unrooted doesn't)
Single root/connected or multi root/disconnected (tree or forest)
Is sibling order relevant? If no, then can the tree structure internally reorder children on updates? If so, iteration order among siblings is no longer defined. But for most trees, sibiling order is actually not meaningful, and allowing the data structure to reorder children on update is very beneficial for some updates.
Really just a tree, or also allow DAG edges (sounds weird, but many people who initially want a tree eventually want a DAG)
Labeled or unlabled? Do you need to store any data per node, or is it only the tree structure you're interested in (the latter can be stored very succinctly)
Query operations
After we have figured out what we define to be a tree, we should define query operations: Basic operations might be "navigate to children, navigate to parent", but there are way more possible operations, e.g.:
Navigate to next/prev sibling: Even most people would consider this a pretty basic operation, it is actually almost impossible if you only have a parent pointer or a children array. So this already shows you that you might need a totally different implementation based on what operations you need.
Navigate in pre/post order
Subtree size: the number of (transitive) descendants of the current node (possibly in O(1) or O(log n), i.e., don't just enumerate them all to count)
the height of the tree in the current node. That is, the longest path from this node to any leave node. Again, in less than O(n).
Given two nodes, find the least common ancestor of the node (with O(1) memory consumption)
How many nodes are between node A and node B in a pre-/post-order traversal? (less than O(n) runtime)
I emphasized that the interesting thing here is whether these methods can be performed better than O(n), because just enumerating the whole tree is always an option. Depending on your application, it might be absolutely crucial that some operations are faster than O(n), or you might not care at all. Again, you will need vastely different data structures depending on your needs here.
Update operations
Until now, I only talked about query opertions. But now to updates. Again, there are various ways in which a tree could be updated. Depending on which you need, you need a more or less sophisticated data structure:
leaf updates (easy): Delete or add a leaf node
inner node updates (harder): Move or delete move an inner node, making its children the children
of its parent
subtree updates (harder): Move or delete a subtree rooted in a node
To just give you some intuition: If you store a child array and your sibling order is important, even deleting a leaf can be O(n) as all siblings behind it have to be shifted in the child array of its parent. If you instead only have a parent pointer, leaf deletion is trivially O(1). If you don't care about sibiling order, it is also O(1) for the child array, as you can simply replace the gap with the last sibling in the array. This is just one example where different data structures will give you quite different update capabilities.
Moving a whole subtree is again trivially O(1) in case of a parent pointer, but can be O(n) if you have a data structure storing all nodes e.g., in pre-order.
Then, there are orthogonal considerations like which iterators stay valid if you perform updates. Some data structures need to invalidate all iterators in the whole tree, even if you insert a new leaf. Others only invalidate iterators in the part of the tree that is altered. Others keep all iterators (except the ones for deleted nodes) valid.
Space considerations
Tree structures can be very succinct. Roughly two bits per node are enough if you need to save on space (e.g., DFUDS or LOUDS, see this explanation to get the gist). But of course, naively, even a parent pointer is already 64 bits. Once you opt for a nicely-navigable structure, you might rather require 20 bytes per node.
With a lot of sophisication, one can also build a data structure that only takes some bits per entry, can be updated efficiently, and still enables all query operations asymptotically fast, but this is a beast of a structure that is highly complex. I once gave a practical course where I had grad students implement this paper. Some of them were able to implement it in 6 weeks (!), others failed. And while the structure has great asymptotics, its complexity makes it have quite some overhead for very simple operations.
Again, no one-size-fits-all.
Conclusion
I worked 5 years on finding the best data structure to represent a tree, and even though I came up with some and there is quite some related work, my conclusion was that there is not one. Depending on the use case, a highly sophsticated data struture will be outperformed by a simple parent pointer. Even defining the interface for a tree is hard. I tried defining one in my paper, but I have to acknowledge that there are various use cases where the interface I defined is too narrow or too large. So I doubt that this will ever end up in STL, as there are just too many tuning knobs.
In a way, std::map is a tree (it is required to have the same performance characteristics as a balanced binary tree) but it doesn't expose other tree functionality. The likely reasoning behind not including a real tree data structure was probably just a matter of not including everything in the stl. The stl can be looked as a framework to use in implementing your own algorithms and data structures.
In general, if there's a basic library functionality that you want, that's not in the stl, the fix is to look at BOOST.
Otherwise, there's a bunch of libraries out there, depending on the needs of your tree.
All STL container are externally represented as "sequences" with one iteration mechanism.
Trees don't follow this idiom.
I think there are several reasons why there are no STL trees. Primarily Trees are a form of recursive data structure which, like a container (list, vector, set), has very different fine structure which makes the correct choices tricky. They are also very easy to construct in basic form using the STL.
A finite rooted tree can be thought of as a container which has a value or payload, say an instance of a class A and, a possibly empty collection of rooted (sub) trees; trees with empty collection of subtrees are thought of as leaves.
template<class A>
struct unordered_tree : std::set<unordered_tree>, A
{};
template<class A>
struct b_tree : std::vector<b_tree>, A
{};
template<class A>
struct planar_tree : std::list<planar_tree>, A
{};
One has to think a little about iterator design etc. and which product and co-product operations one allows to define and be efficient between trees - and the original STL has to be well written - so that the empty set, vector or list container is really empty of any payload in the default case.
Trees play an essential role in many mathematical structures (see the classical papers of Butcher, Grossman and Larsen; also the papers of Connes and Kriemer for examples of they can be joined, and how they are used to enumerate). It is not correct to think their role is simply to facilitate certain other operations. Rather they facilitate those tasks because of their fundamental role as a data structure.
However, in addition to trees there are also "co-trees"; the trees above all have the property that if you delete the root you delete everything.
Consider iterators on the tree, probably they would be realised as a simple stack of iterators, to a node, and to its parent, ... up to the root.
template<class TREE>
struct node_iterator : std::stack<TREE::iterator>{
operator*() {return *back();}
...};
However, you can have as many as you like; collectively they form a "tree" but where all the arrows flow in the direction toward the root, this co-tree can be iterated through iterators towards the trivial iterator and root; however it cannot be navigated across or down (the other iterators are not known to it) nor can the ensemble of iterators be deleted except by keeping track of all the instances.
Trees are incredibly useful, they have a lot of structure, this makes it a serious challenge to get the definitively correct approach. In my view this is why they are not implemented in the STL. Moreover, in the past, I have seen people get religious and find the idea of a type of container containing instances of its own type challenging - but they have to face it - that is what a tree type represents - it is a node containing a possibly empty collection of (smaller) trees. The current language permits it without challenge providing the default constructor for container<B> does not allocate space on the heap (or anywhere else) for an B, etc.
I for one would be pleased if this did, in a good form, find its way into the standard.
Because the STL is not an "everything" library. It contains, essentially, the minimum structures needed to build things.
This one looks promising and seems to be what you're looking for:
http://tree.phi-sci.com/
IMO, an omission. But I think there is good reason not to include a Tree structure in the STL. There is a lot of logic in maintaining a tree, which is best written as member functions into the base TreeNode object. When TreeNode is wrapped up in an STL header, it just gets messier.
For example:
template <typename T>
struct TreeNode
{
T* DATA ; // data of type T to be stored at this TreeNode
vector< TreeNode<T>* > children ;
// insertion logic for if an insert is asked of me.
// may append to children, or may pass off to one of the child nodes
void insert( T* newData ) ;
} ;
template <typename T>
struct Tree
{
TreeNode<T>* root;
// TREE LEVEL functions
void clear() { delete root ; root=0; }
void insert( T* data ) { if(root)root->insert(data); }
} ;
Reading through the answers here the common named reasons are that one cannot iterate through the tree or that the tree does not assume the similar interface to other STL containers and one could not use STL algorithms with such tree structure.
Having that in mind I tried to design my own tree data structure which will provide STL-like interface and will be usable with existing STL algorthims as much as possible.
My idea was that the tree must be based on the existing STL containers and that it must not hide the container, so that it will be accessible to use with STL algorithms.
The other important feature the tree must provide is the traversing iterators.
Here is what I was able to come up with: https://github.com/cppfw/utki/blob/master/src/utki/tree.hpp
And here are the tests: https://github.com/cppfw/utki/blob/master/tests/unit/src/tree.cpp
All STL containers can be used with iterators. You can't have an iterator an a tree, because you don't have ''one right'' way do go through the tree.