Fastest way to traverse arbitary depth tree for deletion? - c++

For my own exercises I'm writing an XML-parser. To fill the tree I use a normal std::stack and push the current node on top after making it a child of the last top-node (should be depth-first?). So I now do the same for deletion of the nodes, and I want to know if there's a faster way.
Current code for deletion:
struct XmlNode{
// ignore the rest of the node implementation for now
std::vector<XmlNode*> children_;
};
XmlNode* root_ = new XmlNode;
// fill root_ with child nodes...
// and then those nodes with child nodes and so fort...
std::stack<XmlNode*> nodes_;
nodes_.push(root_);
while(!nodes_.empty()){
XmlNode* node = nodes_.top();
if(node->children_.size() > 0){
nodes_.push(node->children_.back());
node->children_.pop_back();
}else{
delete nodes_.top();
nodes_.pop();
}
}
Works totally fine but it kinda looks slow. So is there any faster / better / more common way to do this?

Don't go out of your way to do iteratively what can be easily done recursively, unless you can prove that the recursive version is either insufficient (e.g. stack overflows) or slower (which won't happen unless you start overflowing your stack, forcing the OS to either expand it or crash you).
In other words, in general, use iteration for linear structures, and recursion for tree structures.
Compared to recursion, an iterative method was around 3 times slower on my machine. If you can be sure that your XML depth won't exceed a few hundred nestings (which I've never seen inside real-world XML documents), then recursion won't be a problem.
To iterate is human; to recurse, divine. :)

Related

How to quickly deallocate an entire subtree?

I'm implementing an Alpha-Beta pruning (MiniMax) algorithm for a board game. I have a Board class with GetAvailableMoves(), PlayMove(Move x) and UndoMove() which all modify the game position inside the Board class. To implement Alpha-Beta I need a tree structure to keep track of the alpha and beta values of every position.
Since I'll want to calculate the best move multiple times in a game, I want to reuse the part of the tree that I already calculated and throw away the rest. If I implement the tree as doubly-linked nodes, then I'll have to call delete on every node in all subtrees that I want to delete. This can be very expensive.
How can I implement a tree that allows me to quickly cut and destroy an entire branch, possibly in O(1) time?
You can implement a datapool for your data. To get an allocated node you get it from the pool, and when you delete you put it back in the pool. Then you only allocate new memory when your pool is empty, and you only delete at the end of your runtime.
This will still need to move all the out of data objects back to the pool when you cut a subtree, so not O(1). But no delete calls.
Alternatively, if you have plenty of memory available and you just want a O(1) way to delete your subtrees, you can move the root of your subtree to a ToDeleteLater queue. And at the end of your run or whenever your program has free time, it can clear out this queue by running through trees deleting.
Unless the subtrees are huge, deleting honestly should not take too long, so instead of using pointers that you need to delete. You can implement destructors that take care of the deleting (RAII) So you don't have to write a complicated deleter.
Note: It is usually recommended that you post some code or more details so users can give more specific advice.

Transforming recursive DFS-based topological sort into a non-recursive algorithm (without losing cycle detection)

Here is a pseudocode for topological sort from Wikipedia:
L ← Empty list that will contain the sorted nodes
while there are unmarked nodes do
select an unmarked node n
visit(n)
function visit(node n)
if n has a temporary mark then stop (not a DAG)
if n is not marked (i.e. has not been visited yet) then
mark n temporarily
for each node m with an edge from n to m do
visit(m)
mark n permanently
unmark n temporarily
add n to head of L
I want to write it non-recursively without losing cicle detection.
The problem is I don't know how to do that and I thought of many approaches already. Basically the problem is to do DFS but with remembering the "current path" (it corresponds to "temporary marking" certain nodes in pseudocode above). So traditional approach with a stack gives me nothing because when using a stack (and putting neighbors of every node in it) I'm putting nodes there even though I will see them "in the undetermined future" and I only want to keep track of nodes "on my current path" (I see it as walking through a maze with a thread I'm leaving behind me - when I see a dead end, I turn back and "wrap the tread" when doing that and at any point in time I want to remember nodes "with thread lying on them" and nodes on which the thread has been at least once). Any tips that would point me in the right direction? I mean - should I think of using 2 stacks instead of 1, maybe some other data structure?
Or maybe this algorithm is OK and I should leave it in its recursive form. I'm only worrying about exceeding the "recursion depth" for sufficiently large graphs.
Obviously, you'd use a stack but you wouldn't put all adjacent nodes anyway: that would yield a DFS with the wrong size complexity anyway (it would be quadratic in the number of nodes assuming non-parallel edges, otherwise potentially worse). Instead, you'd store the current node together with a state indicating the next node to be visited. You'd always work off the stack's top, i.e., something like this:
std::stack<std::pair<node, iterator> stack;
stack.push(std::make_pair(root, root.begin()));
while (!stack.empty()) {
std::pair<node, iterator>& top = stack.top();
if (top.second == top.first.begin()) {
mark(top.first);
// do whatever needs to be done upon first visit
}
while (top.second != top.first.end() && is_marked(*top.second)) {
++top.second;
}
if (top.second != top.first.end()) {
node next = *top.second;
++top.second;
stack.push(std::make_pair(next, next.first());
}
else {
stack.pop();
}
}
This code assumes that nodes have a begin() and end() yielding suitable iterators to iterate over adjacent nodes. Something along those lines, possibly with an indirection via edges will certainly exist. It also assumes that there are functions available to access a node's mark. In a more realistic imlementation that would probably use something a BGL property map. Whether a std::stack<T> can be used to respresent the stack depends on whether the nodes currently on the stack need to be accessed: std::stack doesn't provide such access. However, it is trivial to create a suitable stack implementation based on any of the STL sequence containers.

what should be the structure of binary search tree node

i am trying to make c++ program for binary search tree which will contain following functionality (actually this is a part of my college assignment):
A) CREATE Binary search tree.
B) Inorder, preorder, postorder traversals. ( non-recursive )
C) Search the Val in tree.
D) Breadth first traversal.
E) Depth first traversal
F) Count leaf nodes, non-leaf nodes.
G) Count no. of levels
my doubt is:-
1. usually a tree node have following structure:
class node{
private:
node *lChild;
int info;
node *rChild;
}
so in case i want to perform depth-first or breadth-first traversal can i change the node structure and add one more pointer pointing to the parent so that i can easily move backward in the hierarchy
class node{
private:
node *parent //pointer to parent node
node *lChild;
int info;
node *rChild;
}
is this considered as normal practice or bad form of programming a binary tree ? and if it is not considered as good way of programming a tree is there any other way or do i have to use the method given in books of using stack (for Depth First) and queue(for breadth first) to store nodes (visited or non-visited accordingly)
2. This is first time i am learning data structures so it will be a great help if someone can explain in simple words that what is the difference between recursive and non-recursive traversal with binary tree in consideration
i change the node structure and add one more pointer pointing to the parent [...] is this considered as normal practice or bad form of programming a binary tree ?
It is not a normal practice (but not quite "bad form"). Each node is a collection of data and two pointers. If you add a third pointer to each node, you will have increased the overhead of each node by 50% (two pointers to three pointers per node) which for a large binary tree will be quite a lot.
This is first time i am learning data structures so it will be a great help if someone can explain in simple words that what is the difference between recursive and non-recursive traversal
A recursive implementation is a function that only applies on a node, then calls itself for the subsequent nodes. This makes use of the application call-stack to process the nodes of the tree.
A non-recursive implementation uses a local stack to push non-processed nodes; then it loops as long as there is data on the stack and processes each entry.
Here's an example for printing to console, that shows difference between recursive and non-recursive ( the example is incomplete, as this is homework :] ):
void recursive_print(node* n) {
std::cout << n->info << "\n";
if(n->lChild)
recursive_print(n->lChild); // recursive call
// n->rChild is processed the same
}
void non_recursive_print(node* n) {
std::stack<node*> stack;
stack.push(n);
while(!stack.empty()) { // behaves (more or less) the same as
// the call-stack in the recursive call
node* x = stack.top();
stack.pop();
std::cout << x->info << "\n";
if(x->lChild)
stack.push(x->lChild); // non-recursive: push to the stack
// x->rChild is processed the same way
}
}
// client code:
node *root; // initialized elsewhere
if(root) {
recursive_print(root);
non_recursive_print(root);
}
You don't need a pointer to the parent node. Think about the cases when you would use it. The only way you can reach a node is through its parent, so you have already visited the parent.
Do you know what recursive means?
There's nothing to stop you adding a parent pointer if you want to. However, it's not usually necessary, and slightly increases the size and complexity.
The normal approach for traversing a tree is some kind of recursive function. You first call the function and pass in the root node of the tree. The function then calls itself, passing the child pointers (one at a time). This happens recursively all the way down the tree until there are no child nodes left.
The function does whatever processing you want on its own node after the recursive calls have returned. That means you're basically traversing down the tree with each call (making your call stack progressively deeper), and then doing the processing on the way back up as each function returns.
The function should never try to go back up the tree the same way it came down (i.e. passing in a parent pointer), otherwise you'll end up with an infinite recursion.
Typically you only need a parent pointer if you need to support iteration
Imagine that you have found a leaf node and then want to find the next node (lowest key greater than current key), for example:
mytree::iterator it1=mytree_local.find(7);
if (it1 != mytree_local.end())
{
mytree::iterator it2=it1.next(); // it1 is a leaf node and next() needs to go up
}
Since here you are starting at the bottom and not the top, you need to go up
But your assignment only requires operations that start at the root node, you shouldn't have a up pointer, follow the other answers for approaches that avoid the need to go up.
I would suggest you look into the Visitor pattern - for its flavor, not specifically for its structure (it's very complex).
Essentially, it is a design pattern that disconnects traversal of a tree in such a way that you have only one set of code that does tree traversal, and you use that set of code to execute various functionality on each node. The traversal code is generally not part of the Node class.
Specifically, it will allow you to not have to write the traversal code more than once - For example, utnapistims answer will force you to write traversal code for every piece of functionality you need; that example covers printing - to ouputXML() would require another copy of traversal code. Eventually, your Node class becomes a huge ungainly beast.
With Visitor, you would have your Tree and Node classes, a separate Traversal class, and numerous functional classes, such as PrintNode, NodeToXML, and possibly DeleteNode, to use with the Traversal class.
As for adding a Parent pointer, that would only be useful if you intended to park on a given node between calls to the Tree - i.e. you were going to do a relative search beginning on a pre-selected arbitrary node. This would probably mean that you had better not do any multi-threaded work with said tree. The Parent pointer will also be difficult to update as a red/black tree can easily insert a new node between the current node and its "parent".
I would suggest a BinaryTree class, with a method that instantiates a single Visitor class, and the visitor class accepts an implementation of a Traversal interface, which would be one of either Breadth, Width or Binary. Basically, when the Visitor is ready to move to the next node, it calls the Traversal interface implementation to get it (the next node).

Fastest way to count/ access DOMNode children using Xerces C++

I'm trying to figure out the fastest way to count the number of child elements of a Xerces C++ DOMNode object, as I'm trying to optimise the performance of a Windows application which uses the Xerces 2.6 DOMParser.
It seems most of the time is spent counting and accessing children. Our application needs to iterate every single node in the document to attach data to it using DOMNode::setUserData() and we were initially using DOMNode::getChildNodes(), DOMNodeList::getLength() and DOMNodeList::item(int index) to count and access children, but these are comparatively expensive operations.
A large performance improvement was observed when we used a different idiom of calling
DOMNode:: getFirstChild() to get the first child node and invoke DOMNode::getNextSibling() to either access a child at a specific index or count the number of siblings of the first child element to get a total child node count.
However, getNextSibling() remains a bottleneck in our parsing step, so I'm wondering is there an even faster way to traverse and access child elements using Xerces.
Yes soon after I posted, I added code to store and manage the child count for each node, and this has made a big difference. The same nodes were being visited repeatedly and the child count was being recalculated every time. This is quite an expensive operation as Xerces essentially rebuilds the DOM structure for that node to guarantee its liveness. We have our own object which encapsulates a Xerces DOMNode along with extra info that we need , and we use DOMNode::setUserData to associate our object with the relevant DOMnode, and that now seems to be the last remaining bottleneck.
The problem with DOMNodeList is, that it is really a quite simple list, thus such operations like length and item(i) have costs of O(n) as can be seen in code, for example here for length:
XMLSize_t DOMNodeListImpl::getLength() const{
XMLSize_t count = 0;
if (fNode) {
DOMNode *node = fNode->fFirstChild;
while(node != 0){
++count;
node = castToChildImpl(node)->nextSibling;
}
}
return count;
}
Thus, DOMNodeList should not be used if one doesn't expect that the DOM-tree will be changed while iterating, because accessing an item in O(n) thus making iteration a O(n^2) operation - a disaster waiting to happen (i.e. a xml-file big enough).
Using [DOMNode::getFistChild()][2] and DOMNode::getNextSibling() is a good enough solution for an iteration:
DOMNode *child = docNode->getFirstChild();
while (child != nullptr) {
// do something with the node
...
child = child->getNextSibling();
}
Which happens as expected in O(n^2).
One also could use [DOMNodeIterator][3] , but in order to create it the right DOMDocument is needed, which is not always at hand when an iteration is needed.

STL Implementation of reheapify

In a graph algorithm, I need to find the node with the smallest value.
In a step of the algorithm the value of this node or its neighbors can be decreased and a few of its neightbors can be removed dependent on their value.
Also, I don't want to search the whole graph for this node each time (although it is not so big (<1000 nodes)).
Therefore I looked at the STL library and found the heap structure which almost does what I want. I can insert and delete nodes very fast, but is there a method to update the heap fast when I only changed the value of one node without resorting the whole heap? I feel it would be a huge bottleneck in the program.
First the conceptual part:
If you use the heap insertion method with the element that decreased it's value as the starting point for insertion instead of starting at the back of the collection everything just works.
I haven't done that in C++ yet, but std::push_heap looks fine for that purpose.