I have a tree of an unknown structure. First, I want to find a node containing a string of text, "Something". Then, after identifying the string's location in the tree, I want to update a different node relative to the string's location. The data is a deeply nested map with several branches of lists.
Is that possible with zippers?
I've studied this approach to editing trees: http://www.exampler.com/blog/2010/09/01/editing-trees-in-clojure-with-clojurezip/. Problem is, I don't beforehand know the location of the string.
Yes! This is exactly the kind of task zippers where designed for.
Repeatedly call zip/next until you find the node you are looking for.
Then call zip/path to find out where you are relative to the root.
Then call zip/up, zip/down, zip/left etc to get to the node to modify.
update the node
call zip/root to get your new map containing these changes.
Related
Sorry in advance, this a very specific question and I cannot provide any piece of code as this is for my job, thus confidential.
I am using the Boost R-trees, and an algorithm that I need to implement requires to access the non leaf nodes of the tree. With Boost rtree library, I only can access leaf nodes in an easy way. I noticed that there is a function to print all the nodes including the non leaf nodes (which means they exist, they are computed), with their position, their level in the tree etc, but I cannot access them the same way than the leaf nodes.
For now, the best solution that I have is to implement a visitor for the tree and overload the operator () to gather the nodes (this is what the print method does to access the nodes).
My question is, does anybody know an easier way to access the non leaf nodes ? Because this one does not seem to be efficient, and I'm loosing time each time I want to access a non leaf node. Moreover, I need to replicate the structure of the tree without the points, and I cannot do that if I cannot access the non leaf nodes.
Thank you in advance !
I don't know what would you like to do exactly so this will be a general answer.
In order to access the tree nodes for the first time you have to traverse the tree structure. In Boost.Geometry rtree visitor pattern is used for that. You could do it manually but internally Boost.Variant is used to represent the nodes so you'll end up with variant visitor instead. At this point you have a few options depending what are you going to do with the nodes. Are you going to modify the r-tree? Will the rtree be moved in memory? Will the addresses of nodes change? How many nodes are you going to access? Do you want to store some kind of reference to a node and traverse the tree structure from that point? Do you want to traverse the structure downward or upward?
One option as you noticed is to traverse the tree structure each time. This is a good approach if the tree structure can change. The obvious drawback is that you have to check all child nodes at each node using some condition (whatever you do in order to pick the node of interest).
If the tree structure does not change but the tree is copied to a different place in memory you can represent the node as a path from the root to the node of interest as list of indexes of child nodes. E.g. a list {1, 2, 3} meaning: traverse the tree using child node 1 of root node, then at the next level pick child node 2, then your node will be child node 3 at the next level. In this case you still have to traverse the tree but doesn't have to check conditions again.
If the tree does not change and nodes stays in the same place in memory you can simply use pointers or references.
I have a wxTreeListCtrl with columns name, path, size. I have constructed tree. can I retrieve a node from tree which has path="some path". Is there any function for this?
No, there is no built-in function for this. You can do it yourself by iterating over the entire tree, of course, but this is not exactly very efficient.
What I'd do instead would be to have a separate map<path,item> in your program and construct the tree from this map -- and, if necessary, keep it updated when items are added to/deleted from the tree. For the latter, keeping a pointer to the item stored in the map as "item data" in wxTreeCtrl could be useful.
I have a DAG-like structure that is essentially a deeply-nested map. The maps in this structure can have common values, so the overall structure is not a tree but a direct acyclic graph. I'll refer to this structure as a DAG for brevity.
The nodes in this graph are of different but finite number of categories. Each category can have its own structure/keywords/number-of-children. There is one unique node that is the source of this DAG, meaning from this node we can reach all nodes in the DAG.
The task is to traverse through the DAG from the source node, and convert each node to another one or more nodes in a new constructed graph. I'll give an example for illustration.
The graph in the upper half is the input one. The lower half is the one after transformation. For simplicity, the transformation is only done on node A where it is split into node 1 and A1. The children of node A are also reallocated.
What I have tried (or in mind):
Write a function to convert one object for different types. Inside this function, recursively call itself to convert each of its children. This method suffers from the problem that data are immutable. The nodes in the transformed graph cannot be changed randomly to add children. To overcome this, I need to wrap every node in a ref/atom/agent.
Do a topological sort on the original graph. Then convert the nodes in the reversed order, i.e., bottom-up. This method requires a extra traverse of the graph but at least the data need not to be mutable. Regarding the topological sort algorithm, I'm considering DFS-based method as stated in the wiki page, which does not require the knowledge of the full graph nor a node's parents.
My question is:
Is there any other approaches you might consider, possibly more elegant/efficient/idiomatic?
I'm more in favour of the second method, is there any flaws or potential problems?
Thanks!
EDIT: On a second thought, a topological sorting is not necessary. The transformation can be done in the post-order traversal already.
This looks like a perfect application of Zippers. They have all the capabilities you described as needed and can produce the edited 'new' DAG. There are also a number of libraries that ease the search and replace capability using predicate threads.
I've used zippers when working with OWL ontologies defined in nested vector or map trees.
Another option would be to take a look at Walkers although I've found these a bit more tedious to use.
I have a tree-like structure, which is constructed by
struct TreeNode
{
std::vector<TreeNode*> p_PrevLevelNodes;
std::vector<TreeNode*> p_NextLevelNodes;
}
and there is some root node stored. In contrast to a classical tree, a Node might have multiple root nodes. All of these root nodes are present in the "classical" tree, but there are so to say additional link ups.
To come to my question: I have to communicate this structure between different instances by both MPI and tcp. Hence, I need some kind of serialization, but dont really know where to start.
Any hints?
What is your tree? Your tree is a pointer pRoot to one of such nodes (TreeNode). Since you have pRoot, you can build list of upper nodes (it is empty for pRoot in usual situation) and list of lower nodes. So you can build list of visited nodes + save additional information about them. And for each of these nodes in list you can repeat all these operations. So you will have big list of nodes with additional information - it is easy to serialize this list instead of you tree.
(Actually it is not necessary to build intermidiate data structure to serialize, but I propose to remember about this structure to simplify implementation of you algorithm)
Something similar was implemented here: http://basicalgos.blogspot.ru/2012/04/44-serialize-and-de-serialize-tree.html (there is much simplier tree, but I think you can repeat this logic for your tree).
Also it might be useful for you to read http://eli.thegreenplace.net/2011/09/29/an-interesting-tree-serialization-algorithm-from-dwarf/
All right, this question requires a bit of reading on your side. I'll try to keep this short and simple.
I have a tree (not a binary tree, just a tree) with data associated to each node (binary data, I don't know what they are AND I don't know how long they are)
Each node of the tree also has an index which isn't related to how it appears in the tree, to make it short it could be like that:
The index number represents the order the user WANTS the tree to be navigated and cannot be duplicated.
I need to store this structure in a file on the disk.
My problem is: how to design a flexible disk storing format that can make loading and working on the tree as easy as possible.
In fact the user should be allowed to
Create a child block to an element (and this should be easy enough, it's sufficient to add data to the file paying attention to avoiding duplicated indices)
Delete a child (I should prompt the user "do you want to delete all this node's children as well? or should I add its children to its parent?"). The tricky part about this is that deleting a node could also free up an index, and I can't let the user use that index again when adding another node (or the order he set could be messed up), I need to update the entire tree!
Swap an index with another one
I'm using C++ and Qt and by now I thought of a lot of structures with a lot of fields like this one
struct dataToBeStoredInTheFile
{
long data_size;
byte *data; //... the data here
int index;
int number_of_children;
int *children_indices; // ... array of integers
}
this has the advantage to identify each node with its respective index, but it's highly slow when swapping indices between two nodes or deleting a node and updating each other node's index because you have to traverse all the nodes and all their "children_indices" arrays.
Would using something like an "hash" to identify each node be more flexible?
Should I use two indices, one for the position in the tree and one for the user's index? If you have any better idea to store the data, you're welcome
I would suggest using something like boost.serialization, then you don't have to worry about the actual format when save on disk, and can concentrate on effective in-memory solution.
Edit: Re-reading your question I see you are using Qt, in that case it should have it's own serialization framework that you can use.
If it doesn't have to be a SINGLE file, you could use the file/directory structure to represent your tree, where each node corresponds to a single file (w/ a directory for each interior node). Maybe not the most efficient, but incredibly easy to do.
Again, if you have some flexibility on the number of files (but not as much as above), you could have one file for the tree structure (so that each node is a fixed size, simplifying its manipulation) and a separate one for storing node contents. To speed up working with the "content file", you could treat it the way a garbage collecting system would: just keep adding new/updated nodes on the end, marking old nodes as no longer in use, and periodically clearing things out.
Better yet, follow #JoachimPileborg's advice :)
I don't think you should use the user-specified index to identify the nodes, as that's not directly related to the way you're storing the tree, and you don't have an efficient way of accessing the nodes by index. You should either keep two indices for each node - the user-specified one, and another one that's implementation dependent; or maintain an array mapping the user-specified index to one you're using for the implementation.
Also, it might be better if you use a different structure to store the tree. For each node, store the following:
the index of the parent
the index of the leftmost son
the index of the left brother
the index of the right brother
This way adding a node and swapping two nodes could be done with some simple pointer manipulations (I don't mean explicit pointers - the indices are somewhat like pointers anyway). Deleting a node would still probably be slow as you have to visit all the children.
As a bonus, if you use this structure, every node has a fixed size (unlike with the linked list you're proposing). This means that you can access a node directly by seeking in the file.
You should also maintain the smallest index the user can use for new nodes - so, for example, even if the largest index was 5 and it was deleted, you still keep 6 as the next free index so 5 cannot be reused.