Polymorphic Abstract Syntax Tree (recursive descent parser): impossible? - c++

I have begun writing a polymorphic recursive descent parser in C++. However I am running an issue. The classes are set up like this:
class Node {
public:
std::vector<Node*> children;
};
class NodeBinary : public Node {
public:
Node* left;
Node* right;
};
class NodeUnary : public Node {
public:
Node* operand;
};
class NodeVar : public Node {
public:
std::string string;
NodeVar(std::string str) : string(str) {};
};
class NodeNumber : public Node {
public:
signed long number;
NodeNumber(signed long n) : number(n) {};
};
// etc.
And then classes like NodeDeclaration, NodeCall, NodeNot, NodeAssignment, NodePlus, NodeMinus, NodeIf etc. will inherit either from Node or something less generic like NodeBinary or NodeUnary.
However, some of them take more specific operands. NodeAssignment always takes a var and a number/expression. So I will have to override Node* left to NodeVar* left and NodeExpr* right. The problem comes in with things like NodePlus. Left can be a NodeVar or a NodeExpr! And the root node has a similar problem: while parsing at the top level to add children nodes to root, how is it possible to tell if a child is a NodeExpr, a NodePlus, a NodeIf etc...?
I could have all Nodes have a enum "type" that says what type it is, but then whats the point of having a nice polymorphic inheritance tree?
How is is this problem normally solved??

If you're using class inheritance for your AST nodes, you need to create an appropriate inheritance hierarchy, as with any object-oriented design.
So, for example, NodeAssignment (which is presumably a specialization of NodeStatement) needs to contain a NodeLValue (of which a NodeVariable is a specialization) and a NodeValue. As usual, LValues (i.e. things you can assign to) are a subset of Values, so NodeLValue will be a specialization of NodeValue. And so on. Your binary operator node will contain left and right members, both of which are NodeValue base objects (I would expect NodeValue to be pure virtual, with a large number of specific specializations.)
If you insist on using a recursive descent parser, each parsing function needs to return an appropriate subclass of Node, so that the function which parses the left-hand side of an assignment would logically return a NodeLValue*, ready to insert into the NodeAssignment constructor. (Frankly, I'd ditch the word Node in all of those class names. Put them all into the namespace node:: and save yourself some typing.)

Related

Call a derived class function not in base class without dynamic casting to maximize performance

I am implementing an AVLTree in C++ as an exercise as preparation for future projects. An AVLTree is basically a BSTTree but with the extra detail that the tree is balanced, i.e., for a node x with left child y and right child z, the number of children on the left of x ( child y and children of y) cannot differ from the number of children on the right of x by more than 1 node.
Each node will represent an int value, and has a reference to the parent node, to a possibly existing left child node and to a possibly existing right child node.
class BSTNode {
protected:
int value;
BSTNode* parent;
BSTNode* left;
BSTNode* right;
...
public:
virtual void setLeftChild(BSTNode* left);
...
}
To track the number of children of a node, i have AVLNode extending BSTNode with two integers, leftAffinity and rightAffinity, where leftAffinity tells me the number of nodes on the left (similarly for the right with rightAffinity). Since i do not ensure that values i'm adding are always unique, i cannot update affinities before finding a spot to place the new node.
class AVLNode : public BSTNode {
private:
int leftAffinity;
int rightAffinity;
...
public:
void setLeftChild(BSTNode* left) override;
...
}
Once i successfully set the left child of a node to left, in AVLNode i also update leftAffinity of the current node (and recursively to parent of parent until the root of the tree is reached) to
left->getLeftAffinity() + left->getRightAffinity() .
The problem here is that functions on affinity are defined in AVLNode, thus i cannot immediately call left->getLeftAffinity() without a cast, as here i don't know whether left is a BSTNode or an AVLNode. I know that the idea is for any child of an AVLNode to also only be an AVLNode, which could be enforced by ensuring that any BSTNode that is not an AVLNode is transformed into an AVLNode.
I do not want to change the function argument to receive AVLNode* instead as this forces me to declare left,right and parent as AVLNode* in the AVLNode class, and thus i get duplicate variables, one of each for BSTNode class and one of each for AVLNode class, even if left,right and parent are private in BSTNode class.
I do not want to use dynamic casting, as the point of the exercise was to create an efficient tree data structure with O(log n) complexity on insert and get operations. I've read dynamic casting is both expensive and should also be avoided as it usually hints at a design flaw.
Possible approaches:
Create a "no-op" function in the BSTNode class that is overriden in AVLNode class to manage node affinities. This function should not be there as it is not related to BSTNode.
Dynamic casting on each add operation, and when succesfull, once for each parent until the root of the tree, which is too expensive.
Not use virtual at all for these functions, which would force me to overload many functions in AVLNode.
Change all functions to receive AVLNode* instead of BSTNode*, while forcing conversion of BSTNode* into AVLNode* with a constructor in AVLNode, but it this would require dynamic casting as otherwise i would lose leftAffinity and rightAffinity of a node being added, and cause other problems with function inheritance.
The choice that looks best to me is to take the "no-op" approach as it would work if performance is key.
I'm new to C++ so i appreciate any thought on the approaches i've written and for example if there is a trivial solution i overlooked. What would be the best choice in this situation?
Problems like this arise when you are using the base class just to share code among different implementations, instead of to constraint method arguments and return values.
BSTNode probably doesn't add enough value in this role to justify the complications it produces, and you should probably just get rid of it.
But occasionally things like that are useful to provide skeleton base implementations for self-referential classes that are hard to implement. In C++, the type-correct way to do it is with the oddly-named curiously recurring template pattern: https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern
Using this pattern, the base class is defined as a template that takes the derived class as a parameter:
template <class NODE> class BSTNode {
int value;
NODE *left;
NODE *right;
...
}
class AvlNode : public BSTNode<AvlNode> {
...
}
Now all the pointers in AvlNode have the correct type, and no casting is required.

How to decide which methods belong where? C++

I finished writing an AVL tree, and one of the things that bothered me when programming it is deciding which methods belong to which class:
template <class ValueType,class CompareFunction>
class avlTree{
class avlTreeException{};
public:
class ElementDoesntExist : public avlTreeException{};
class EmptyTree : public avlTreeException{};
class ElementAlreadyExists : public avlTreeException{};
private:
class Node{
friend class avlTree;
ValueType* data;
Node *sonA,*sonB,*dad;
int height,balance;
private:
CompareFunction compare;
int treeSize;
Node* root;
};
(I removed the public\private methods to save space).
For some methods I think I made the right choice: update is a method of Node (updates height,etc).
Insert/remove are functions of the tree.
But for example the function destroyNodeTree(Node*) which is used by the tree destructor. What I did is to have destroyNodeList() call destroyNodeTree(root)
template <class ValueType,class CompareFunction>
void avlTree<ValueType,CompareFunction>::avlTree::destroyNodeTree(Node* rooty) {
if(!rooty){
return;
}
Node *A=rooty->sonA,*B = rooty->sonB;
destroyNodeTree(A);
destroyNodeTree(B);
}
However, I could have made destroyNodeTree() a method of Node, and call it on the root from the destructor (it would be implemented in the same way).
I had a similar issue deciding where the method findNode(const ValueType&) should go, meaning it obviously is a public method of tree, but should I create a method for Node with the same name and have the tree function call the node method on the root? Is it even acceptable to have a public function and an inner class method with the same name?
In my opinion it's better to have it as a method of nodes because that gives more flexibility (I'll be able to search for a node only under a certain node), but on the other hand that means that the method either needs to create an instance of class compare, or have each node keep a copy of an instance, or have class compare as a static function. Each of those has a disadvantage in my opinion though: creating an instance can be costly, keeping a copy can be costly, and forcing the user to make the function static doesn't seem right to me (but I'm horribly inexperienced so fix me if I'm wrong).
In any case I eventually made findNode a treeFunction only and not a method (the HW assignment didn't need the tree to able to search from a specific node so it doesn't make any difference there) but I don't want to write bad code.
To conclude, how do we decide where to save performance,memory,flexibility of the user (would he rather be able to search from any node or create nonstatic compare functions?)

How to implement if-else branch in a abstract syntax tree using C++

I have a mini AST structure in which each node may have a left and a right child, for example:
class AstNode;
typedef std::shared_ptr<AstNode> AstNodePtr;
class AstNode
{
public:
AstNode()
: m_children(2)
{
}
virtual ~AstNode()
{
}
virtual void accept(AstNodeVisitor& visitor) = 0;
void addLeft(const AstNodePtr& child);
void addRight(const AstNodePtr& child);
const AstNodePtr left() const;
const AstNodePtr right() const;
private:
std::vector<AstNodePtr> m_children;
};
It works great for the operations I need so far, but when it comes to a branch statement, I don't know how to implement it with this binary tree structure. According to wiki, a branch statement will have 3 leaves:
I can get away with it for now because most of my if statement has no else, so the condition will be the left child, and if-body will be the right child. But it's not going to work with a else-body. I can potentially embed condition in the branch node itself, which means do a pre-order traversal on branch node, but it feels uncomfortable because no other type of nodes involve potential subtree traversal when evaluating itself.
Maybe the AST should not be a binary tree, rather each node can have any number of children, but that will(I think) make the implementation a bit awkward. Any suggestions please?
By the nature, ASTs should be implemented in multi-child trees to support if-condition-then expressions. But a workaround could be having 2 types for IF;
if-block(left:condition, right:if-body)
if-body(left:any, right: any)
left child of the if-body is used if condition of the parent is true, right child is used otherwise.
You could define an abstract AST node which doesn't hold any children. Then for each number of child nodes ("arity"), define a different subclass:
a unary AST node for things like return or for unary operators such as negation
a binary AST node for binary operations
a ternary AST node for if-then-else constructs as well as for the ternary operator ?!
maybe a dynamic n-ary AST node for the set of cases in a switch-case construct, if you want to support them. Your statement-sequence perfectly fits into this node type, too. If you don't implement this node type, you could put statement sequences in a binary tree structure, but that sounds like a dirty hack.
maybe a four-ary (is this the name?) AST node for for loops. They have an initial, a conditional and an incremental statement plus a body.
Note that implementing everything with dynamically sized children lists is a bad idea in my opinion, since it doesn't make sense to have a node of type operator = with only one child, as an example.
Then, inherit the concrete node types from the node class corresponding to the arity.
class ASTNode {
public:
virtual ASTNode() {}
virtual void accept(AstNodeVisitor& visitor) = 0;
};
// ----
class ASTNodeUnary : public ASTNode {
protected:
AstNodePtr c1;
};
class ASTNodeBinary : public ASTNode {
protected:
AstNodePtr c1, c2;
};
class ASTNodeTernary : public ASTNode {
protected:
AstNodePtr c1, c2, c3;
};
class ASTNodeDynamic : public ASTNode {
protected:
std::vector<AstNodePtr> children;
};
// ----
class ASTNodeBranch : public ASTNodeTernary {
...
};
and so on

OOP: Designing a tree, dividing functionality between Node class and a Tree Class

I need to implement a custom tree class (using C++). Throughout my work I've seen many tree implementations. Some implemented a "super Node" class which was exposed to the user. An instance of which (root node) acted as the tree. Some exposed a tree class, which utilized a node class to construct a tree. Some used the node class as a pure data struct, leaving functionality such as tree construction to the tree class. Others put the construction - like node.Split(), into the node class.
Say you needed to design a binary tree (like a KD-tree). What would be the "best" approach from OOP perspective. Have node class just contain the data, or the logic to split itself into children? How much logic in general should a node class contain?
Thanks ahead for constructive input!
Here's one OOP rule you should always follow,
Every class represents an entity. Classes have properties and methods
i.e. the attributes of the entities and it's behaviour
So you need to follow your understanding of the scenario.
Here's how I look at a node.
It has some data, a right node reference and a left node reference. I don't think a node should be able to do anything except provide you with the data so I would write a node class something like this:
class Node
{
public:
int Data; // yeah, you would obviously use templates, so it's not restricted to a particular type
Node* Left;
Node* Right;
// also you can write constructors and maybe some sort of cast operator overload to be able to make the
// we could also make it return the subtree itself
getRightSubTree(){ return Tree(*Right); }
getLeftSubTree(){ return Tree(*Left); }
};
A Tree should be like this then.
class Tree
{
private:
Node root;
public:
Tree(Node t):root(t){} // every tree should have a root. Cannot be null.
// since the root node will be able to handle the tree structure, now we'll need the necessary methods
void addNode(Node n){
// code here
}
....
getSubTree(int data){
// check if node with that data exists and all
...
Node n = getNode(data);
return Tree(n);
}
};
Okay so I think you've got an idea now. It's all about how you look at the system.

How to cast to a variable type - emulating type variables in C++

I am implementing something similar to a typed genetic programming and have become a little stuck with regards to C++ types.
I have a network of nodes, nodes have different types, for example some are functional nodes whereas others are just data. In order to deal with collections of such nodes, I saw no option other than to use polymoprphism to store colection of base class pointers.
class Node {
private:
std::string label;
std::string node_type;
};
template
<typename FNC>
class FunctionalNode : public Node {
std::function<FNC> function;
};
class Network {
std::vector<Node*> nodes;
...
}
Note I have a templated FunctionalNode, which stores a std::function, I think (but am not certain) that my problem applies equally if I were to store a plain function pointer instead.
So the problem is, at some point given a Node pointer to a dervied FunctionalNode, I need to apply the stored function to some values. This means I need to cast the base pointer onto the derived class, but since it is templated I am not sure how to do this.
I would like to do something like the following fake C++ code, which would need something like type variables:
Node * applyfunction(Node * functional_node, std::vector<Node*> arguments) {
typevariable function_type = convert_node_type_to_typevariable(functional_node.node_type)
functional_node_derived * = static_cast<function_type>(functional_node);
....
}
Where a node's node_type is some structure I use to contain the type information of the node, e.g. the type of its functional form, and convert_node_type_to_typevariable would convert this to a typevariable I can use in this hypothetical C++ language.
Any ideas how I could implementing this seing as C++ lacks support for type variables, or a completely different approach to the problem?
You should exploit your polymorphic structure. You can define Node with a pure virtual method instead of making applyfunction a free function.
class Node {
protected:
std::string label;
std::string node_type;
public:
virtual ~Node () {}
virtual Node * applyfunction (std::vector<Node *> args) = 0;
};
Then your derivations would perform the work.
template
<typename FNC>
class FunctionalNode : public Node {
std::function<FNC> function;
public:
Node * applyfunction (std::vector<Node *> args) {
//...
}
};