I have a tree-like structure, which is constructed by
struct TreeNode
{
std::vector<TreeNode*> p_PrevLevelNodes;
std::vector<TreeNode*> p_NextLevelNodes;
}
and there is some root node stored. In contrast to a classical tree, a Node might have multiple root nodes. All of these root nodes are present in the "classical" tree, but there are so to say additional link ups.
To come to my question: I have to communicate this structure between different instances by both MPI and tcp. Hence, I need some kind of serialization, but dont really know where to start.
Any hints?
What is your tree? Your tree is a pointer pRoot to one of such nodes (TreeNode). Since you have pRoot, you can build list of upper nodes (it is empty for pRoot in usual situation) and list of lower nodes. So you can build list of visited nodes + save additional information about them. And for each of these nodes in list you can repeat all these operations. So you will have big list of nodes with additional information - it is easy to serialize this list instead of you tree.
(Actually it is not necessary to build intermidiate data structure to serialize, but I propose to remember about this structure to simplify implementation of you algorithm)
Something similar was implemented here: http://basicalgos.blogspot.ru/2012/04/44-serialize-and-de-serialize-tree.html (there is much simplier tree, but I think you can repeat this logic for your tree).
Also it might be useful for you to read http://eli.thegreenplace.net/2011/09/29/an-interesting-tree-serialization-algorithm-from-dwarf/
Related
Sorry in advance, this a very specific question and I cannot provide any piece of code as this is for my job, thus confidential.
I am using the Boost R-trees, and an algorithm that I need to implement requires to access the non leaf nodes of the tree. With Boost rtree library, I only can access leaf nodes in an easy way. I noticed that there is a function to print all the nodes including the non leaf nodes (which means they exist, they are computed), with their position, their level in the tree etc, but I cannot access them the same way than the leaf nodes.
For now, the best solution that I have is to implement a visitor for the tree and overload the operator () to gather the nodes (this is what the print method does to access the nodes).
My question is, does anybody know an easier way to access the non leaf nodes ? Because this one does not seem to be efficient, and I'm loosing time each time I want to access a non leaf node. Moreover, I need to replicate the structure of the tree without the points, and I cannot do that if I cannot access the non leaf nodes.
Thank you in advance !
I don't know what would you like to do exactly so this will be a general answer.
In order to access the tree nodes for the first time you have to traverse the tree structure. In Boost.Geometry rtree visitor pattern is used for that. You could do it manually but internally Boost.Variant is used to represent the nodes so you'll end up with variant visitor instead. At this point you have a few options depending what are you going to do with the nodes. Are you going to modify the r-tree? Will the rtree be moved in memory? Will the addresses of nodes change? How many nodes are you going to access? Do you want to store some kind of reference to a node and traverse the tree structure from that point? Do you want to traverse the structure downward or upward?
One option as you noticed is to traverse the tree structure each time. This is a good approach if the tree structure can change. The obvious drawback is that you have to check all child nodes at each node using some condition (whatever you do in order to pick the node of interest).
If the tree structure does not change but the tree is copied to a different place in memory you can represent the node as a path from the root to the node of interest as list of indexes of child nodes. E.g. a list {1, 2, 3} meaning: traverse the tree using child node 1 of root node, then at the next level pick child node 2, then your node will be child node 3 at the next level. In this case you still have to traverse the tree but doesn't have to check conditions again.
If the tree does not change and nodes stays in the same place in memory you can simply use pointers or references.
Say i have a tree implementation like this (simplified):
class Node
{
public:
std::string name;
int attr_1;
double attr_2;
unsigned int nChildren;
Node* Children;
}
If i need to get a specific Node by its attribute or name, do i need to loop through every single child node from the root to find it? Or is there a faster search algorithm, or faster/better tree implementation? Say, i need to find a node by its class and id attributes, like when i need to apply a CSS rule or something.
In your current draft I assume that the only possible way to find a Node with a specific name, id, class any other data is to traverse the tree looking at every node. The time complexity would be O(nNodes).
You might be interested in binary search trees which allows you to do search operations in O(log(nNodes)) which is way more faster! However they require some additional effort to stay valid when you add/remove node. Also it is important to keep the tree balanced which is the main requirement for O(log(nNodes)) time.
Edit 1
I am familiar with css syntax. It is quit complicated to implement at tree to fulfil all css requirements. Here indeed binary search tree cannot represent a DOM tree. A DOM tree should be represented by Node, references to its children and possible to its parent. A binary search tree may store a references to these Nodes and successfully serve the search queries by id for example. But if any node is removed/added/id changes the binary search tree should react accordingly.
If there are no rules defining where a node can / can't be, you have to scan all nodes till you find the match.
There's no magical guessing in algorithms.
I have a DAG-like structure that is essentially a deeply-nested map. The maps in this structure can have common values, so the overall structure is not a tree but a direct acyclic graph. I'll refer to this structure as a DAG for brevity.
The nodes in this graph are of different but finite number of categories. Each category can have its own structure/keywords/number-of-children. There is one unique node that is the source of this DAG, meaning from this node we can reach all nodes in the DAG.
The task is to traverse through the DAG from the source node, and convert each node to another one or more nodes in a new constructed graph. I'll give an example for illustration.
The graph in the upper half is the input one. The lower half is the one after transformation. For simplicity, the transformation is only done on node A where it is split into node 1 and A1. The children of node A are also reallocated.
What I have tried (or in mind):
Write a function to convert one object for different types. Inside this function, recursively call itself to convert each of its children. This method suffers from the problem that data are immutable. The nodes in the transformed graph cannot be changed randomly to add children. To overcome this, I need to wrap every node in a ref/atom/agent.
Do a topological sort on the original graph. Then convert the nodes in the reversed order, i.e., bottom-up. This method requires a extra traverse of the graph but at least the data need not to be mutable. Regarding the topological sort algorithm, I'm considering DFS-based method as stated in the wiki page, which does not require the knowledge of the full graph nor a node's parents.
My question is:
Is there any other approaches you might consider, possibly more elegant/efficient/idiomatic?
I'm more in favour of the second method, is there any flaws or potential problems?
Thanks!
EDIT: On a second thought, a topological sorting is not necessary. The transformation can be done in the post-order traversal already.
This looks like a perfect application of Zippers. They have all the capabilities you described as needed and can produce the edited 'new' DAG. There are also a number of libraries that ease the search and replace capability using predicate threads.
I've used zippers when working with OWL ontologies defined in nested vector or map trees.
Another option would be to take a look at Walkers although I've found these a bit more tedious to use.
Yes I have read this: Ukkonen's suffix tree algorithm in plain English?
It is a great explanation of the algorithm but it is not so much the algorithm itself that is killing me but rather the data structure used to implement it.
I need the data structure to be as minimal and as fast as possible and I have seen many implementations using only Nodes, some with only edges, some with edges and nodes, etc. Then there are variations, a website I was reading claimed that a node need not have a pointer to its parent, and other places don't account for how children of a node are managed.
My idea is to have a Node structure with int start, and int * end (points to the current end or phase i). Each node will have a suffix_link pointer, a pointer to its parent, and a pointer to a vector containing its child nodes.
My question is, are these things sufficient and necessary to implement a suffix tree? Can I minimize it in any way? I haven't seen an implementation with children in vectors yet so I am skeptical as to my own thinking. Could someone explain what one would need to implement a suffix tree in this manner using only nodes?
Following may be helpful:
Ukkonen’s Suffix Tree Construction
Here we have
1. start, end to represent edge label
2. suffix link
3. an array for children
When i have to implement that algorithm the better explained document was the original Ukkonen paper and there's one newer with images.
Yes in this documents are all the inside to implement Ukkonen's Suffix Tree algorithm.
I have a system where i need to represent something similar as Path, a path just provides a route to reach a particular node. There can be multiple Path that can be used to reach same node.
I am currently representing a Path using vector of Nodes, I need to do operations like replaceSubpath, containsNode, containsSubPath, appendNode, getRootNode, getLeafNode (very similar operations as done for string). All of these operations can be done on vector but performance for a large path can suck.
I am looking at using boost::graph but have no experience with it, I would like to know if using boost::graph would be correct/good data structure for these and similar operations?
Any advices on using some other data structure would be helpful too, I am aware I can optimize my vector solution by keep (multi) map of node to iterator etc.
Essentially, the class adjacency_list<> from Boost.Graph is a vector of vertices. Vertex descriptor is an integer index in this vector.
Typically, a a tree or a path (path is a special case of a tree, right?) is represented as a predecessor map (like going backward from leaves to root or from target to source). In case of integer vertex descriptors, such predecessor map is simply vector<int>. I do not think you can represent a path or a tree in a more compact way.
Of course, such vector of predecessors can be substituted into string operations, esp. those from Boost.String_Algo, http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo.html
From what you describe it sounds like you are generating and editing paths in a graph, perhaps for optimizing routes etc.
I don't think that one data structure will give you what you want. I would keep the graph structure separate from the paths you are generating.
replaceSubpath: To me this would suggest a doubly linked list implementation. When you have the start and end of your path just paste it in and replace the subpath.
containsNode: Consider adding a map or set for fast containment checks.
containsSubPath: This could be tough depending on your other concerns and speed needs. If this is a very important operation consider a Suffix Tree to test sub paths quickly. Keep in mind its better if the path doesn't change much since constructing them is O(N)
appendNode: Linked list will be easy here
getRootNode: Hold a pointer to the current root node.
getLeafNode: Hold a pointer to the current leaf node.
I would make a custom data structure that can address these concerns based on your goals. Finding subpaths and replacing them quickly might be competing performance goals. Usually more search optimization = more construction overhead making them less dynamic.
Take a look at how some other code that you admire implements the need to manage paths. For example, you might look at several implementation of Dijkstra and choose the one that looks best, most convenient or just to your taste.
IMHO it is not a good idea to model a "path" as an object, but rather think of it as a property of the nodes in a graph.
In general, I would consider 'marking' nodes that are on the path. For example, the class you use to contain the properties of the nodes might have a flag indicating true if the node is on the path and an attribute with the index of the next node on the path.