Fastest leaf tree search - c++

Say i have a tree implementation like this (simplified):
class Node
{
public:
std::string name;
int attr_1;
double attr_2;
unsigned int nChildren;
Node* Children;
}
If i need to get a specific Node by its attribute or name, do i need to loop through every single child node from the root to find it? Or is there a faster search algorithm, or faster/better tree implementation? Say, i need to find a node by its class and id attributes, like when i need to apply a CSS rule or something.

In your current draft I assume that the only possible way to find a Node with a specific name, id, class any other data is to traverse the tree looking at every node. The time complexity would be O(nNodes).
You might be interested in binary search trees which allows you to do search operations in O(log(nNodes)) which is way more faster! However they require some additional effort to stay valid when you add/remove node. Also it is important to keep the tree balanced which is the main requirement for O(log(nNodes)) time.
Edit 1
I am familiar with css syntax. It is quit complicated to implement at tree to fulfil all css requirements. Here indeed binary search tree cannot represent a DOM tree. A DOM tree should be represented by Node, references to its children and possible to its parent. A binary search tree may store a references to these Nodes and successfully serve the search queries by id for example. But if any node is removed/added/id changes the binary search tree should react accordingly.

If there are no rules defining where a node can / can't be, you have to scan all nodes till you find the match.
There's no magical guessing in algorithms.

Related

Is there a way to access non leaf nodes in a C++ Boost rtree

Sorry in advance, this a very specific question and I cannot provide any piece of code as this is for my job, thus confidential.
I am using the Boost R-trees, and an algorithm that I need to implement requires to access the non leaf nodes of the tree. With Boost rtree library, I only can access leaf nodes in an easy way. I noticed that there is a function to print all the nodes including the non leaf nodes (which means they exist, they are computed), with their position, their level in the tree etc, but I cannot access them the same way than the leaf nodes.
For now, the best solution that I have is to implement a visitor for the tree and overload the operator () to gather the nodes (this is what the print method does to access the nodes).
My question is, does anybody know an easier way to access the non leaf nodes ? Because this one does not seem to be efficient, and I'm loosing time each time I want to access a non leaf node. Moreover, I need to replicate the structure of the tree without the points, and I cannot do that if I cannot access the non leaf nodes.
Thank you in advance !
I don't know what would you like to do exactly so this will be a general answer.
In order to access the tree nodes for the first time you have to traverse the tree structure. In Boost.Geometry rtree visitor pattern is used for that. You could do it manually but internally Boost.Variant is used to represent the nodes so you'll end up with variant visitor instead. At this point you have a few options depending what are you going to do with the nodes. Are you going to modify the r-tree? Will the rtree be moved in memory? Will the addresses of nodes change? How many nodes are you going to access? Do you want to store some kind of reference to a node and traverse the tree structure from that point? Do you want to traverse the structure downward or upward?
One option as you noticed is to traverse the tree structure each time. This is a good approach if the tree structure can change. The obvious drawback is that you have to check all child nodes at each node using some condition (whatever you do in order to pick the node of interest).
If the tree structure does not change but the tree is copied to a different place in memory you can represent the node as a path from the root to the node of interest as list of indexes of child nodes. E.g. a list {1, 2, 3} meaning: traverse the tree using child node 1 of root node, then at the next level pick child node 2, then your node will be child node 3 at the next level. In this case you still have to traverse the tree but doesn't have to check conditions again.
If the tree does not change and nodes stays in the same place in memory you can simply use pointers or references.

Ukkonen's suffix tree algorithm, what is necessary?

Yes I have read this: Ukkonen's suffix tree algorithm in plain English?
It is a great explanation of the algorithm but it is not so much the algorithm itself that is killing me but rather the data structure used to implement it.
I need the data structure to be as minimal and as fast as possible and I have seen many implementations using only Nodes, some with only edges, some with edges and nodes, etc. Then there are variations, a website I was reading claimed that a node need not have a pointer to its parent, and other places don't account for how children of a node are managed.
My idea is to have a Node structure with int start, and int * end (points to the current end or phase i). Each node will have a suffix_link pointer, a pointer to its parent, and a pointer to a vector containing its child nodes.
My question is, are these things sufficient and necessary to implement a suffix tree? Can I minimize it in any way? I haven't seen an implementation with children in vectors yet so I am skeptical as to my own thinking. Could someone explain what one would need to implement a suffix tree in this manner using only nodes?
Following may be helpful:
Ukkonen’s Suffix Tree Construction
Here we have
1. start, end to represent edge label
2. suffix link
3. an array for children
When i have to implement that algorithm the better explained document was the original Ukkonen paper and there's one newer with images.
Yes in this documents are all the inside to implement Ukkonen's Suffix Tree algorithm.

Serialization of extended Tree Structure

I have a tree-like structure, which is constructed by
struct TreeNode
{
std::vector<TreeNode*> p_PrevLevelNodes;
std::vector<TreeNode*> p_NextLevelNodes;
}
and there is some root node stored. In contrast to a classical tree, a Node might have multiple root nodes. All of these root nodes are present in the "classical" tree, but there are so to say additional link ups.
To come to my question: I have to communicate this structure between different instances by both MPI and tcp. Hence, I need some kind of serialization, but dont really know where to start.
Any hints?
What is your tree? Your tree is a pointer pRoot to one of such nodes (TreeNode). Since you have pRoot, you can build list of upper nodes (it is empty for pRoot in usual situation) and list of lower nodes. So you can build list of visited nodes + save additional information about them. And for each of these nodes in list you can repeat all these operations. So you will have big list of nodes with additional information - it is easy to serialize this list instead of you tree.
(Actually it is not necessary to build intermidiate data structure to serialize, but I propose to remember about this structure to simplify implementation of you algorithm)
Something similar was implemented here: http://basicalgos.blogspot.ru/2012/04/44-serialize-and-de-serialize-tree.html (there is much simplier tree, but I think you can repeat this logic for your tree).
Also it might be useful for you to read http://eli.thegreenplace.net/2011/09/29/an-interesting-tree-serialization-algorithm-from-dwarf/

What does "tree" refer to in breadth and depth based searches?

I need some working snippets on C++ code regarding breadth/depth first searches. Also, in the links below, when using the term tree, is it in reference to a binary tree or more specefically a red and black tree? Or is this a more abstract tree of sorts? Does anyone have a link to working code for these searches...along with constructing the tree?
Tree seems to refer to some sort of constuct with in the "graph"? I believe this is some sort of math I have not taken yet.
breadth or depth first search 1
breadth or depth first search 2
The tree in question is the thing they're searching. It's kinda hard to understand search algorithms without knowing what it is they are searching through.
A tree is a type of graph. A graph is a series of nodes (which presumably represent some data) with connections between certain nodes. A tree is a graph where the connections between nodes form a hierarchy. For any given node in the graph, it has exactly one "parent" that points to it, and it points to zero or more child nodes. And the nodes cannot form circles; a parent cannot point to a child who points to that parent.
Basically, like branches on a tree.
The term "tree" refers to any data structure that can be abstractly looked at as a tree.
A "tree" is a data structure in which there are parent nodes and child nodes, and each child has a single parent, with a single "root" node not having a parent.
If a node in your tree has multiple parents, it is called a "graph".
A tree is a special case of a directed acyclic graph (basically a bunch of 'nodes' with arrows ('edges') pointing at each other, such that there cannot be a loop of arrows) in which the following two conditions hold:
No node has more than one incoming edge
There exists a single distinguished node (the 'root') from which all other nodes are reachable.
The nodes reachable via an outgoing edge from some node N are often called N's children.
Breadth-first and depth-first search apply to generic trees (indeed, they apply to all DAGs). However there are some more specific types:
Binary trees are trees in which no node has more than two outgoing edges; outgoing edges are labelled, usually as 'left' and 'right'
Search trees are binary trees in which each node has a key; further, the key in some node N is greater than the child on its left edge (if any) and less than the child on its right edge (if any). This allows for very fast searching for a specific key.
Red-black trees are a specific kind of search tree in which a moderately complex algorithm is used to make sure all keys are approximately the same distance from the root.

Efficient Huffman tree search while remembering path taken

As a follow up question related to my question regarding efficient way of storing huffman tree's I was wondering what would be the fastest and most efficient way of searching a binary tree (based on the Huffman coding output) and storing the path taken to a particular node.
This is what I currently have:
Add root node to queue
while queue is not empty, pop item off queue
check if it is what we are looking
yes:
Follow a head pointer back to the root node, while on each node we visit checking whether it is the left or right and making a note of it.
break out of the search
enqueue left, and right node
Since this is a Huffman tree, all of the entries that I am looking for will exist. The above is a breadth first search, which is considered the best for Huffman trees since items that are in the source more often are higher up in the tree to get better compression, however I can't figure out a good way to keep track of how we got to a particular node without backtracking using the head pointer I put in the node.
In this case, I am also getting all of the right/left paths in reverse order, for example, if we follow the head to the root, and we find out that from the root it is right, left, left, we get left, left, right. or 001 in binary, when what I am looking for is to get 100 in an efficient way.
Storing the path from root to the node as a separate value inside the node was also suggested, however this would break down if we ever had a tree that was larger than however many bits the variable we created for that purpose could hold, and at that point storing the data would also take up huge amounts of memory.
Create a dictionary of value -> bit-string, that would give you the fastest lookup.
If the values are a known size, you can probably get by with just an array of bit-strings and look up the values by their index.
If you're decoding Huffman-encoded data one bit at a time, your performance will be poor. As much as you'd like to avoid using lookup tables, that's the only way to go if you care about performance. The way Huffman codes are created, they are left-to-right unique and lend themselves perfectly to a fast table lookup.