I'm writing a program in C++ that uses genetic techniques to optimize an expression tree.
I'm trying to write a class Tree which has as a data member Node root. The node constructor generates a random tree of nodes with +,-,*,/ as nodes and the integers as leaves.
I've been working on this awhile, and I'm not yet clear on the best structure. Because I need to access any node in the tree in order to mutate or crossbreed the tree, I need to keep a dicionary of the Nodes. An array would do, but it seems that vector is the recommended container.
vector<Node> dict;
So the Tree class would contain a vector dict with all the nodes of the tree (or pointers to same), the root node of the tree, and a variable to hold a fitness measure for the tree.
class Tree
{
public:
typedef vector<Node>dict;
dict v;
Node *root;
float fitness;
Tree(void);
~Tree();
};
class Node
{
public:
char *cargo;
Node *parent;
Node *left;
Node *right;
bool entry;
dict v;
Node(bool entry, int a_depth, dict v, Node *pparent = 0);
};
Tree::Tree()
{
Node root(true, tree_depth, v);
};
There seems to be no good place to put typedef vector<Node>dict;, because if it goes in the definition of Tree, it doesn't know about Node, and will give an error saying so. I havn't been able to find a place to typedef it.
But I'm not even sure if a vector is the best container. The Nodes just need to be indexed sequentally. The container would need to grow as there could be 200 to 500 Nodes.
I think a standard Binary Tree should do... here is an example of a (binary) expression tree node:
const int NUMBER = 0, // Values representing two kinds of nodes.
OPERATOR = 1;
struct ExpNode { // A node in an expression tree.
int kind; // Which type of node is this?
// (Value is NUMBER or OPERATOR.)
double number; // The value in a node of type NUMBER.
char op; // The operator in a node of type OPERATOR.
ExpNode *left; // Pointers to subtrees,
ExpNode *right; // in a node of type OPERATOR.
ExpNode( double val ) {
// Constructor for making a node of type NUMBER.
kind = NUMBER;
number = val;
}
ExpNode( char op, ExpNode *left, ExpNode *right ) {
// Constructor for making a node of type OPERATOR.
kind = OPERATOR;
this->op = op;
this->left = left;
this->right = right;
}
}; // end ExpNode
So when you're doing crossover or mutation and you want to select a random node you just do the following:
Count the number of nodes in the tree (only need to do this ones in the constructor).
Select a random index from 0 to the size of the tree.
Visit each node and subtract 1 from the random index until you reach zero.
Return the node when the index is 0.
In this case you don't need to know anything about the parent of the node. So mating/mutation should look like this:
select nodeX
select nodeY
if( Rand(0,1) == 1 )
nodeY->left = nodeX;
else
nodeY->right = nodeX;
And that should be it...
I don't think the Node or the Tree are the first classes to write.
I'd start with Expression. In your case you need at least a BinaryExpression, as well as an expression with no subnodes (constants or variables). Each Binary expression should contain auto_ptr<Expression> lhs and auto_ptr<Expression> rhs.
You could then easily write a function to enumerate through the expression tree's members. If performance turns out to be relevant, you can cache the list of expressions in the tree, and invalidate it manually when you change the expression. Anything more advanced is likely to be slower and more error prone.
I don't see why an expression needs to know it's parent expression. It only makes life harder when you start editing expressions.
You may implement a list over nodes. Then, each node will have two additional pointers inside:
class Node{
...
Node* sequentialPrevious;
Node* sequentialNext;
...
}
And so will the tree:
class Tree{
...
Node* sequentialFirst;
Node* sequentialLast;
...
}
Than you will be albe to move bidirectionally over nodes just by jumping to sequentialFirst or sequentialLast and then iteratively to sequentialNext or sequentialPrevious. Of course, Node constructor and destructor must be properly implemented to keep those pointers up to date.
Related
I tried searching for this problem on stackoverflow but couldn't find it, so pardon me if it already existed.
So, what I wish to do is to create a function that traverses a tree and returns a pointer to the Node with the highest value. The Tree would be unordered and asymmetric, and will not have a fixed depth. Each node has a pointer to its Parent node, a list containing its Child nodes, and an integer named 'value'. And the tree would have a pointer to its root node, like this:
struct Node
{
private:
Node* parent;
list<Node> childs;
int value;
public:
// Getters, setters and constructors
}
struct Tree
{
private:
Node* root;
public:
// Getters, setters and constructors
}
And, as I stated before, I wish to make a function that traverses the entire tree, aka every single Node in the entire tree regardless of the depth, and returns a pointer to the node with the highest value. I assume it'll require recursion, but I can't figure out a way to do this.
Pardon me if my question seems dumb / stupid, but I really need help
You can use recursive method, which returns the pointer to the node with maximal value of current and child nodes:
struct Node
{
...
Node* getMaxNode()
{
Node* maxNode = this;
for (auto& child : this->childs) {
Node* childsMaxNode = child.getMaxNode();
if (childsMaxNode->getValue() > maxNode->getValue())
maxNode = childsMaxNode;
}
return maxNode;
}
}
If current node doesn't have child nodes, it will return pointer to the current node. So, in struct Tree you can implement something like this:
struct Tree
{
Node* getMax()
{
return this->root->getMaxNode();
}
}
I have a binary search class and i want to write a function for deleting a special node but i don't know how.
the basic class is :
class Node {
friend class Tree;
private:
long long rep;
Node *left, *right;
string data;
public:
Node( string d )
: data( d ), left( NULL ), right( NULL ), rep( 1 ) {}
};
class Tree {
private:
Node *root;
public:
void delete_node( Node *cur , string s );
void delete_node_helper( string s );
};
There're 3 parts in deletion of a node from the binary search tree:
Find a node to delete.
Delete a node (free memory, etc).
Merge children of the deleted node.
In your particular code example, I'd say that looking for the node should be the responsibility of void delete_node_helper(string s);, deleting a node should be the responsibility of void delete_node(Node *cur, string s);, and merging the children should be the responsibility of the newly created function.
Given that the algorithms of the first two parts are pretty straighforward, let me explain in detail only the third one.
To merge two BSTs (of which we know which one is left and which one is right) we should decide who will be whose child and perform the recursive merging if necessary. The code looks like this:
Node* merge(Node* left, Node* right) {
if (left == nullptr) {
return right;
}
if (right == nullptr) {
return left;
}
if (rand() & 1) { // <- chose parent
left->right = merge(left->right, right);
return left;
}
right->left = merge(left, right->left);
return right;
}
On the marked line we actually make a decision on which node will be whose parent. In this particular example, the result is random, but any other strategy may be implemented. For example, you could store heights (or sizes) in all nodes of your tree and make the smaller tree root child of a larger tree root.
Delete a special node from a BST tree
I just tried the deletion code in above link and it worked nice.
What is the correct way based on the theory to create a Node for a Binary Tree?
For example:
struct Node
{
int data;
Node *left;
Node *right;
};
The problem I'm currently facing is that I have 2 different answers from several sources (books,website,online lectures.. etc).
From "Introduction to Algorithms",edition 3, p 286,287 : "In addition to a key and satellite data, each node contains attributes left, right, and p that point to the nodes corresponding to its left child,its right child, and its parent, respectively."
Which means something like this:
struct Node
{
int data;
Node *parent;
Node *left;
Node *right;
};
On the other hand, I found several links which DO NOT follow this design such as:
http://algs4.cs.princeton.edu/32bst/
http://math.hws.edu/eck/cs225/s03/binary_trees/
http://www.cprogramming.com/tutorial/lesson18.html
These implementations DO NOT keep a link to the parent and from some online lectures it is said that Trees do NOT traverse backwards (aka. can't see the parent) which counters the notion from the book!
In RedBlack trees for instances you NEED to see the grandparent and uncle of that node to determine whether to re-colour and/or rotate to rebalance the tree.
In AVL trees you don't since the focus is on the height of sub-trees.
Quad Trees and Octrees are the same that you don't need the parent.
Questions:
Can someone please answer me this and with valid sources explain which is the CORRECT way to design a node for a Binary Tree or for Any Tree (B-Trees,..etc)?
Also what is the rule with Traversing Backwards? I know of Pre-order, In-order, Post-order, Breadth-First, Depth-First(Pre-order) and other AI Heuristic algorithms for traversals.
Is it true that you are NOT allowed to move backwards in a tree ie from child to parent? If so, then why does the book suggest a link to parent node?
The fundamental Binary Tree (foundation) requires child pointers:
struct binary_tree_node
{
binary_tree_node * left_child;
binary_tree_node * right_child;
};
There are many modifications that can be made to the foundation that help facilitate searching or storage.
These can include (but are not limited to):
parent pointer
array of child pointers
"color" indicator
specialized leaf nodes -- no child links
The amenities depend on the usage of the data structure. For example, an array of child nodes may help speed up I/O access, where reading a "page" node is as efficient as reading a single node (See B-Tree). The "color" indicator may help with the decision for balancing. The specialized "leaf" nodes reduce the amount of memory occupied by the tree.
As for traversal, a tree can be traversed in any method. There are no rules preventing a traversal from child to parent. Some traversals may include sibling to sibling.
Some books or websites may pick nits about a traditional or fundamental "binary tree" data structure. I find that restrictions get in the way.
There isn't any canonical definition.
In general, imperative-language (e.g., C++) tend to favor the with-parent approach. It simplifies the implementation of efficient rebalancing, and, as Thomas Matthews pointed out, facilitates constant-space iterators.
Functional languages (e.g., Haskell), tend to use the no-parent approach (see Purely Functional Data Structures). Since no modifications are possible, all rebalancing is done by recopying along the search path anyway, so no back pointer is needed. Being strongly recursion oriented, the design of a constant space iterator is also not much of a concern there.
There is no hard and fast rule that there must be a link back to the parent in your tree data structure. Having a link back to the parent is analogous to a doubly linked list. Not having a link back to the parent is just a linked list. With a back link, obviously you gain more flexibility, but at the expense of (relatively) more complicated implementation. Many problems can be solved with a linked list while some others require a doubly linked list.
It depends on your task
Truly speaking binary search tree is a concept and there is no strict or standard rules for designing the data structure. But to understand the basic functionality (eg. insert, delete, find etc.) people use very basic data structure like,
struct Node
{
int data;
Node *left;
Node *right;
};
But it is your task which may design it differently for different purpose. For example, given a tree node at some point of your task if you need to find its parent node in single operation you might think to design the node struct like,
struct Node
{
int data;
Node *parent;
Node *left;
Node *right;
};
Some other complex implementations may require to store a list of siblings too. Which will be like,
struct Node
{
int data;
Node *parent;
Node *left;
Node *right;
list<Node> *siblings;
};
So, there is no strict standard
struct tree_node
{
tree_node* left_child;
tree_node* right_child;
int data; // here you can use whatever type or data you want. Even generic type
};
The following node definition (in Java) is for a balanced binary tree rather than a BST.
// Copyright (C) NNcNannara 2017
public class Node
{
public Node Left;
public Node Right;
public Node Parent;
public State Balance;
public Node()
{
Left = this;
Right = this;
Parent = null;
Balance = State.Header;
}
public Node(Node p)
{
Left = null;
Right = null;
Parent = p;
Balance = State.Balanced;
}
public Boolean isHeader ()
{ return Balance == State.Header; }
}
This is optimised for balancing routines. The idea that a set node derives from Node as follows.
// Copyright (C) NNcNannara 2017
public class SetNode<T> extends Node
{
public T Data;
public SetNode(T dataType, Node Parent)
{
super(Parent);
Data = dataType;
}
}
And a dictionary node is as follows.
// Copyright (C) NNcNannara 2017
public class DictionaryNode<K, T> extends Node
{
public T Data;
public K Key;
public DictionaryNode(K keyType, T dataType, Node Parent)
{
super(Parent);
Key = keyType;
Data = dataType;
}
}
Balancing and iteration are non-generic in nature and are defined for the base class Node. Of course, binary trees may also exist on disk, whereby the node type is as follows.
package persistent;
public class Node
{
public long Left;
public long Right;
public long Parent;
public long Key;
public calculus.State Balance;
public Node()
{
Left = 0;
Right = 0;
Parent = 0;
Balance = calculus.State.Header;
Key = 0;
}
public Node(long p)
{
Left = 0;
Right = 0;
Parent = p;
Balance = calculus.State.Balanced;
Key = 0;
}
public Boolean IsHeader () { return Balance == calculus.State.Header; }
}
Instead of references being present, long integer offsets into the node and data file are present. Note that there is only one node type for all collections on disk.
Here's a simplified version of my Node class :
class Node {
public:
Node();
// indicators of whether the node is a top or bottom node
bool Top;
bool Bot;
// pointers for tree structure
Node *Parent;
Node *LeftC;
Node *RightC;
std::list<Node*> getNodesList();
};
What I want is be able to get a list of pointers to the nodes in my tree in a certain order. I tried the following code to do this :
std::list<Node*> Node::getNodesList(){
if (Bot) return (std::list<Node*>(1,this));
else {
std::list<Node*> temp (1,this);
temp.splice(temp.end(), LeftC->getNodesVector()); // Combine with left childrens
temp.splice(temp.end(), RightC->getNodesVector()); // Combine with right childrens
return temp;
}
}
The splice function doesn't work and give me an error.
So my questions are :
Why isn't the splice function working to combine the lists?
Is there a more efficient way to return the list of pointers to the nodes?
Since I do not know the exact error you have, just a quick glance at your code tells me that your Node class probably doesn't know what getNodesVector() is since it isn't defined in your class.
I have a rooted ordered tree representing sets of integers. Each node stores the size of the associated subtree, and also the max and min elements in this subtree. The branch degree of all the nodes if fixed (but determined at runtime). Also for sufficiently small subtrees I would like to change the representation to a bitmap for the subset associated. For example the root node may store a set of size 1000000, one of this children would store a subset of size 100000, then again one of his children would store a subset of size 10000 and in the next level we would stop using this representation and store just a plain bitmap for the associated subset.
I'm trying to implement this structure in C++ and my definition for the node type stores three integers (size, min and max), an array of pointers (something like node_t ** children) to subtrees and the bitmap (in case we are using this representation). The problem is that all the nodes are storing at least one element which is irrelevant (if the set is big enough we would be using the array of pointers but not the bitmap, for example). How should the node type be declared to solve this problem ? I thought about using two subtypes of node (one for each case) but I am not sure what the impact on the performance at runtime would be.
Thanks in advance.
PS. Please let me know if the question is unclear to edit it.
Since you're using multiple representations, you'll probably need at least two node types: The first will be a generic node that handles the root as well as nearby descendants, and the second type will contain a pointer to a map. The latter nodes don't have any children persay, but their immediate ancestors should see them as an entire sub-tree rather than a terminating node that points to a map.
Since each of the upper nodes have pointers to their children, you'll need a way to ensure that these pointers are also able to point to the mapNodes as well as the branching ones. A good way to do this is to create a virtual base node type with a virtual function that returns whatever data you're looking for. For example:
class baseNode {
virtual int getLargest();
virtual baseNode* addData(int);
};
class leafNode : baseNode { //for non-map termination
leafNode(int in) {Data = in;}
int getLargest() {return Data;}
baseNode* addData(int);
int Data;
};
class treeNode : baseNode {
public:
int getLargest(); //returns leftChild->getLargest(), etc
baseNode* addData(int);
baseNode* leftChild;//can point to either a treeNode or mapNode
baseNode* rightChild;
};
class mapNode : baseNode {
baseNode* addData(int);
int getLargest(); //parses subMap to find/return the desired value
Map* subMap;
};
You'll need a bit of finessing to get it to do what you need it to, but the principle is the same. Keep in mind that with 1m objects, every byte you add increases the net memory use by about a megabyte, so do try to keep things minimal. If all of your branching nodes eventually reach a mapNode, you can eliminate the leafNode declaration altogether.
Adding data to the structure is tricky, especially since you're working with multiple types and the parents (hopefully) don't know anything about their neighbors; Use virtual accessors to do what's needed. In many scenarios, if a branching node tries to add a value 'down the line', the child node it references may need to change type. In this case, the child should construct the new substructure then return it to the parent. This can be done like so:
baseNode* treeNode::addData(int in) {
if ((childCount+1) < threshold) { //not enough to merit a map
//....
//if (input needs to go to the leftChild) {
if (leftChild == 0) {
leftChild = new leafNode(in);
} else {
leftChild = leftChild->addData(in);
}
//}
return (baseNode*)this; //casting may be optional
} else { //new Data merits converting self + kids into a map
mapNode* newMap = new mapNode();
//Set newMap->subMap to children, deleting as you go
delete this;//remove self after return
return (baseNode*)newMap; //return the mapNode holding subtree
}
}
baseNode* leafNode::addData(int in) {
treeNode* tmpNode = new treeNode(); //create replacement
tmpNode->leftChild = this; //pin self to new node
tmpNode->rightChild = new leafNode(in); //store data
return (baseNode*)tmpNode;
}
baseNode* mapNode::addData(int in) {
subMap->addValue(in);//However you do it...
return (baseNode*)this; //parent is always a treeNode
}
The leftChild = leftChild->addData(in); usually won't actually modify anything, especially if it points to a treeNode, however it doesn't really hurt anything to do so and the extra if (newPtr != leftChild) check would just add unnecessary overhead. Note that it will cause a change if a leafNode needs to change into a treeNode with multiple kids, or if it's a treeNode with enough children to merit changing itself (and it's kids!) into a mapNode.