Constructing AST during LR parsing - c++

I have written an LR(1) parser that can successfully parse strings in the language of my grammar into a Concrete Syntax Tree, but I am now trying to construct an Abstract Syntax Tree.
I am using an inheritance design for my AST nodes:
struct ASTNode {
virtual Type typeCheck() = 0;
}
struct IDNode : public ASTNode {
string name;
...
}
struct INTNode : public ASTNode {
int value;
...
}
struct BOPNode : public ASTNode {
ASTNode *pLeft;
ASTNode *pRight;
...
}
struct Add_BOPNode : public BOPNode {
...
}
struct ParamNode : public ASTNode {
string name;
ASTNode *pTypeSpecifier;
...
}
struct ParamListNode : public ASTNode {
vector<ParamNode*> params;
...
}
struct FuncDec : public ASTNode {
string functionName;
ASTNode *pFunctionBody;
ASTNode *pReturnType;
ASTNode *pParams;
...
}
When I perform a reduction in my LR(1) parser I generate a new node depending on the rule that was used for the reduction. This is pretty straightforward for most of the nodes, but I'm not sure of a clean way to implement a node that contains a list of other nodes.
Using the ParamListNode from above as an example:
struct stack_item {
int state;
int token;
string data;
ASTNode *node;
};
/// rule = the number of the rule being reduced on
/// rhs = the items on the right-hand side of the rule
ASTNode* makeNode(int rule, vector<stack_item> rhs) {
switch(rule) {
/// <expr> ::= <expr> '+' <term>
case 1: return new Add_BOPNode(rhs[0].node, rhs[2].node);
/// <param> ::= IDENT(data) ':' <type>
case 2: return new ParamNode(rhs[0].data, rhs[2].node);
/// <param_list> ::= <param>
case 3: return new ParamList(rhs[0].node);
/// <param_list> ::= <param_list> ',' <param>
case 4: {
auto list = dynamic_cast<ParamListNode*>(rhs[0].node);
list->params.push_back(rhs[2].node);
return list;
}
...
}
}
Since generating a node requires a subclass of ASTNode to be returned, I have to create a subclass that encloses a vector<> with each sub-node. However, since not every node needs to be a list structure, I have to dynamic_cast<> to the subclass before I can access the internal list.
I feel like there should be a cleaner way to handle a list of sub-nodes without having to rely on dynamic_cast<>.
Another question is about the FuncDec node. It has pParams which should be a ParamList (or vector<Param*> directly), but to do that I would have to dynamic_cast<> the incoming ASTNode to a ParamList or Param node. Again, I feel like there should be a way to not use dynamic_cast<>, but I can’t think of one.
Also, if you have any other suggestions about how I can better structure or implement anything that would be greatly appreciated :)

My LRSTAR Parser Generator creates an abstract-syntax tree (AST) by using only one class, Node. Each node is the same structure, a pointer to the token (in the symbol table if a leaf node), and pointers to parent, child and next nodes. The next pointer allows you to have a list of nodes (multiple children for a parent node). This has worked well for many years.
During processing of the AST, it is the function associated with the node which takes care of the processing of the node. For example, the add function will do something different than the subtract function. The functions are different, instead of having a different class for each type
of node.
Here is the node structure that I use:
class Node
{
public:
int id; // Node id number
int prod; // Production (rule) number
int sti; // Symbol-table index (perm or temp var).
int prev; // Previous node.
int next; // Next node.
int line; // Line number.
int child; // Child node.
int parent; // Parent node.
};

Related

Implement a virtual function for two derived classes, that is the same except for one variable Type

I have an abstract class Node that can either be a Leaf or a NonLeaf. I have written a large function SplitNode. The problem is, this function is basically the same for a Leaf as for a NonLeaf. The only difference being that it operates on the entries vector for Leafs, as opposed to the children vector, for NonLeafs. The code is otherwise identical in both cases. For example in one case I do entries[i]->r to access some Rectangle property, and in the other case I do children[i]->r. So the main difference beyond the 2 variable names, is the type of the actual vector. How am I supposed to implement this, without copying and pasting the same function, implemented slightly differently for Leaf and NonLeaf?
Edit: I also want the SplitNode function to be able to be called recursively.
class Leaf;
class Node
{
public:
Node();
virtual Leaf& ChooseLeaf(const Rectangle& entry_r) = 0; // this makes the Node class Abstract
Rectangle r;
unique_ptr<Node> parent;
};
class Leaf : public Node
{
public:
Leaf();
Leaf& ChooseLeaf(const Rectangle& entry_r) override;
vector<unique_ptr<IndexEntry>> entries;
};
class NonLeaf : public Node
{
public:
NonLeaf();
Leaf& ChooseLeaf(const Rectangle& entry_r) override;
vector<unique_ptr<Node>> children;
};
Dummy illustration of the SplitNode() function:
void SplitNode()
{
// in the Leaf case:
if (this.entries.size() > rtree.M)
{ ... }
// in the NonLeaf case:
if (children.size() > rtree.M)
{ ... }
// in the Leaf case:
entries[0]->r.DoSomething();
// in the NonLeaf case:
children[0]->r.DoSomething();
// Recursion
parent.SplitNode();
...
|
This is a textbook case for a template function. Presuming that the common logic freestanding logic whose only dependency is the vector itself:
template<typename T>
void doSplitNode(T &entries_or_children)
{
for (auto &entry_or_child:entries_or_children)
{
auto &the_r=entry_or_child->the_r;
// Here's your entries[i]->r, or children[i]->r
}
}
// ...
class Leaf : public Node
{
public:
// ...
void SplitNode()
{
doSplitNode(entries);
}
};
class NonLeaf : public Node
{
// ...
void SplitNode()
{
doSplitNode(children);
}
};
Additional work will be needed of the shared logic has additional dependencies. There's no universal solution here, everything depends on the details. Perhaps the template itself can be moved into a class, with both NonLeaf and Leaf multiply-inheriting from it, and then implementing the additional dependencies as virtual/abstract methods.

Cannot initialize a parameter of type 'TreeNode' with an expression of type 'std::shared_ptr_access''

I'm trying to create a linked list that stores a pointer to a binary tree,
The binary tree class is a subclass derived from a generic TreeNode class which I made.
The TreeNode class has it's AddNode method implemented (just as a dummy, but it is should be callable), but when I try to invoke that method from a subclass of TreeNode I am getting the following error:
Cannot initialize object parameter of type 'TreeNode' with an expression of type: 'std::__shared_ptr_access<ArtistPlaysNode>,__gnu_cxx::_S_atomic, false, false>::element_type'(aka 'ArtistPlaysNode')
Here is the relevant part of the TreeNode class:
// TreeNode.h
class TreeNode {
protected:
int key;
int height;
shared_ptr<TreeNode> father;
shared_ptr<TreeNode> left;
shared_ptr<TreeNode> right;
public:
explicit TreeNode(int key);
TreeNode(int key, shared_ptr<TreeNode> father, shared_ptr<TreeNode> left, shared_ptr<TreeNode> right);
virtual StatusType AddNode(shared_ptr<TreeNode> node);
};
// TreeNode.cpp
StatusType TreeNode::AddNode(shared_ptr<TreeNode> node) {
return INVALID_INPUT;
}
Here is ArtistPlaysNode:
// ArtistPlaysNode.h
class ArtistPlaysNode : public TreeNode {
private:
int artistId;
shared_ptr<SongPlaysNode> SongPlaysTree;
shared_ptr<MostPlayedListNode> ptrToListNode;
public:
ArtistPlaysNode(int artistId);
ArtistPlaysNode(int artistId, shared_ptr<SongPlaysNode> ptrToSongPlaysTree, shared_ptr<MostPlayedListNode> ptrToListNode);
int GetArtistId();
};
Here is the linked list, called MostPlayedListNode:
// MostPlayedListNode.h
class MostPlayedListNode {
private:
int numberOfPlays;
shared_ptr<ArtistPlaysNode> artistPlaysTree;
shared_ptr<ArtistPlaysNode> ptrToLowestArtistId;
shared_ptr<SongPlaysNode> ptrToLowestSongId;
shared_ptr<MostPlayedListNode> previous;
shared_ptr<MostPlayedListNode> next;
public:
// Create the first node in the list (0 plays)
MostPlayedListNode(int numOfPlays);
// Create a new node with a new highest number of plays
MostPlayedListNode(int numOfPlays, shared_ptr<MostPlayedListNode> previous);
// Create a new node with a number of plays between to values (1<2<3)
MostPlayedListNode(int numOfPlays, shared_ptr<MostPlayedListNode> previous, shared_ptr<MostPlayedListNode> next);
bool AddArtist(shared_ptr<ArtistPlaysNode> artistNode);
};
And here is the function where the error occurs:
// MostPlayedListNode.cpp
bool MostPlayedListNode::AddArtist(shared_ptr<ArtistPlaysNode> artistNode) {
if (ptrToLowestArtistId) {
// There are already artists stored in this linked list
this->artistPlaysTree->AddNode(artistNode); // -->>> this line throws the error.
return true
} else {
this->artistPlaysTree = artistNode;
return true;
}
return false;
}
I tried overriding the AddNode method inside ArtistPlaysNode, but that didn't work and made the compiler complain about being unable to cast from one pointer to the other.
Trying to search online for an answer didn't bring up any relevant results
Ok so in short, the error was caused by a lack of Forward Declarations.
Notice that the ArtistPlaysNode class has a shared_ptr of type MostPlayedListNode and of type SongPlaysNode as it's members.
And at the same time, the MostPlayedList class has a shared_ptr of type 'ArtistPlaysNode' and of type SongPlaysNode as it's members.
In addition, both ArtistPlaysNode and SongPlaysNode are derived from the TreeNode class.
This creates a scenario where these classes have members of the other type, in an almost cyclic fashion.
This usually causes errors of the type:
expected class name before '{' token. as seen in this question
Or it may cause an error of the type:
'NAME' was not declared in this scope as seen in enter link description here
In order to solve this issue, we need to either make sure everything a class, function or header relies on is declared before it is being used.
Or we need to provide the compiler with forward-declarations, these will allow the compiler to recognize that class, without its' complete definition being available.
In the case of my code, the fix was adding forward declarations in the class files of MostPlayedListNode, SongPlaysNode and ArtistPlaysNode.
for example, the top portion of the updated MostPlayedListNode.h file:
using std::shared_ptr;
using std::make_shared;
class ArtistPlaysNode; // this is a forward declaration
class SongPlaysNode; // this is a forward declaration
class MostPlayedListNode {
private:
int numberOfPlays;
shared_ptr<ArtistPlaysNode> artistPlaysTree;
shared_ptr<ArtistPlaysNode> ptrToLowestArtistId;
shared_ptr<SongPlaysNode> ptrToLowestSongId;
shared_ptr<MostPlayedListNode> previous;
shared_ptr<MostPlayedListNode> next;
public:
And the updated ArtistPlayesNode.h file:
using std::shared_ptr;
using std::make_shared;
class SongPlaysNode; // this is a forward declaration
class MostPlayedListNode; // this is a forward declaration
class ArtistPlaysNode : public TreeNode {
private:
int artistId;
shared_ptr<SongPlaysNode> SongPlaysTree;
shared_ptr<MostPlayedListNode> ptrToListNode;
public:
In conclusion, while writing certain data structures, forward-declarations are important in order for the compiler to recognize all of the necessary objects, if they're not already defined when they are needed by the object referencing them.
In my code I needed forward-declarations to account for Mutual Recursion, but that might not always be the case.

Make sure that all constructors call same function c++, design pattern

Let us assume that we have a class called Node which has a member called sequence and id. We want to print the sequence of the Node in many differnt ways. Instead of adding the print functions directly into the Node class, we put them into a seperate class called NodePrinter. Each Node, needs to have a "working" NodePrinter in any case.
Which implies that:
Node has a NodePrinter * printer member
Every constructor of Node needs to create a new NodePrinter
My idea now was to create a BaseNode and move the NodePrinter into that one. It has only one constructor, which takes a Node as an input and assigns it to the NodePrinter:
#include <iostream>
#include <string>
using namespace std;
class NodePrinter;
class Node;
class BaseNode
{
public:
BaseNode() = delete;
BaseNode(Node * node);
~BaseNode();
NodePrinter * printer;
};
class Node: public BaseNode
{
public:
Node(string sequence): BaseNode(this), sequence(sequence){}
Node(int id, string sequence): BaseNode(this), sequence(sequence), id(id){}
int id;
string sequence;
};
class NodePrinter
{
private:
Node * node;
public:
NodePrinter() = delete;
NodePrinter(Node * node): node(node){}
void print_node()
{
std::cout<<node->sequence<<endl;
}
};
BaseNode::BaseNode(Node * node)
{
node->printer = new NodePrinter(node);
}
BaseNode::~BaseNode()
{
delete printer;
printer = nullptr;
}
int main()
{
Node node("abc");
node.printer->print_node();
return 0;
}
Thereby each node is forced to call BaseNode(this) and the resources get allocated.
Is this reasonable, or is this whole approach already twisted from the start? Is there a better way to do this?
One thing that seems odd to me is that the printer depends on an instance of Node, shouldn't it be possible for a single printer to print multiple nodes? And I also wouldn't have Node depend on a NodePrinter either, because then you can't print the same node with multiple printers.
Anyhow, if you really need to keep the 1-to-1 correspondence, the simplest way is to just initialize the NodePrinter directly where the member variable is declared in Node:
#include <iostream>
#include <memory>
#include <string>
class Node;
class NodePrinter
{
private:
Node * node;
public:
NodePrinter() = delete;
NodePrinter(Node * node): node(node){}
void print_node();
};
class Node
{
public:
Node(std::string sequence) : sequence(std::move(sequence)){}
Node(int id, std::string sequence) : id(id), sequence(std::move(sequence)) {}
int id;
std::string sequence;
std::unique_ptr<NodePrinter> printer = std::make_unique<NodePrinter>(this);
};
void NodePrinter::print_node()
{
std::cout<< node->sequence << '\n';
}
int main()
{
Node node("abc");
node.printer->print_node();
return 0;
}
Live demo on wandbox.

Inheritance and AVL/BST Trees

Is there any way to use the same insert function for both Bst and Avl tree? The problem is that Bst and Avl have different Node types, but I don't want to make the Bst Node a general case(with height and Node* parent inside, which makes no sense because there is no need of parent and height inside a Bst).
class Bst
{
public:
struct Node
{
int value;
Node* left;
Node* right;
};
Node* insert(Node* node) {/* do stuff using Bst::Node */}
// ...
};
class Avl : public Bst
{
public:
struct Node : public Bst::Node
{
int height;
Node* parent;
};
// now I want that Bst::insert use this Node
// instead of the old one
Node* insert(Node* node)
{
Node* inserted_node = Bst::insert(node);
/* rotations stuff */
return inserted_node;
}
};
Roughly what I'm trying to do is make Bst::Node "virtual".
So, how can I solve the problem of implenting the Avl Tree without rewriting the entire insert function just because Node changed?
Actually I'm also working on this stuff and I think you're very clear to describe what you want.
At the first, it's may be little confuse about the given interface, insert() should not return the pointer of the Node, doesn't it. We may use the findNode() function, which return the pointer of the Node and exactly do this work only.
Back to the main question, may be you can use the template to set your Node type for every function in the BST.
But the BST is not just a abstract interface, which also implement the BST operation, so it's not CRTP..
The pseudo code for now may be the following :
// pre-define :
//parent ptr also alleviate the implementation of BST.
template<typename T>
class BST{
... omit..
protected:
template<typename node_type>
class BST_Node{
public:
T val;
BST_Node *left, *right, *parent;
BST_Node():left{nullptr},
right{nullptr},
parent{nullptr}, val{}{};
// empty {} default to value initialization.
}
... omit ...
}
template<typename T>
class AVL_Node : public BST_Node{
public:
short height;
AVL_Node(T val):BST_Node(val), height(0){};
}
template<typename T>
void insert(T val){
AVL_Node<T> Node(val);
BST<T>::insert_node<AVL_Node>(Node);
AVL_Node<T>* ptr = BST<T>::find_node<AVL_Node>(val);
ptr->height = BST<T>::get_height(ptr);
state = chk_balance(ptr);
switch(state){
case 0: // tree very balance..
break;
case 1:
LL_rotate(ptr);
break;
case 2:
RR_rotate(ptr);
break;
... omit
}
}
# help this post solve your question..
Maybe you want CRTP (in which case you haven't given enough info about your needs for even a rough example, but a simpler less powerful template approach may make more sense to you. Have a base class (under each of your tree types) that has no data members, and just defines static template functions for the common code. Since the functions are static, you need to pass in the relevant data (for insert that should be &root) but that should not be much trouble. (Rough and untested):
struct tree_base
{
template <class Node>
static Node* insert( Node** where, Node* what)
{
Node* here;
while ( (here = *where) != 0 )
{
if ( *what < *here ) where = &(here->left);
else if ( *here < *what ) where = &(here->right);
else
{
Trying to insert something already there, what should be done
}
}
*where = what;
return what; // Is that the desired return?
}
};
Then each of your real tree classes would inherit from tree_base and would call tree_base::insert(&root, new_node) to do the common parts of insert
A CRTP version of that would allow root to be a member of the base class even though it points to the Node type of the derived class. Given root as a member of the base class, the insert function doesn't need to be static and doesn't need to take &root as input. And since a CRTP base class is already correctly templated to have access to the Node type, the base class insert method wouldn't need to be a template. All that would be a lot more things to learn (by looking at some real examples of CRTP) and probably overkill for the code sharing you want.

Null Object Pattern, Recursive Class, and Forward Declarations

I'm interested in doing something like the following to adhere to a Null Object design pattern and to avoid prolific NULL tests:
class Node;
Node* NullNode;
class Node {
public:
Node(Node *l=NullNode, Node *r=NullNode) : left(l), right(r) {};
private:
Node *left, *right;
};
NullNode = new Node();
Of course, as written, NullNode has different memory locations before and after the Node class declaration. You could do this without the forward declaration, if you didn't want to have default arguments (i.e., remove Node *r=NullNode).
Another option would use some inheritence: make a parent class (Node) with two children (NullNode and FullNode). Then the node example above would be the code for FullNode and the NullNode in the code above would be of type NullNode inheriting from Node. I hate solving simple problems by appeals to inheritence.
So, the question is: how do you apply Null Object patterns to recursive data structures (classes) with default arguments (which are instances of that same class!) in C++?
Use extern:
extern Node* NullNode;
...
Node* NullNode = new Node();
Better yet, make it a static member:
class Node {
public:
static Node* Null;
Node(Node *l=Null, Node *r=Null) : left(l), right(r) {};
private:
Node *left, *right;
};
Node* Node::Null = new Node();
That said, in both existing code, and amendments above, you leak an instance of Node. You could use auto_ptr, but that would be dangerous because of uncertain order of destruction of globals and statics (a destructor of some global may need Node::Null, and it may or may not be already gone by then).
I've actually implemented a recursive tree (for JSON, etc.) doing something like this. Basically, your base class becomes the "NULL" implementation, and its interface is the union of all interfaces for the derived. You then have derived classes that implement the pieces- "DataNode" implements data getters and setters, etc.
That way, you can program to the base class interface and save yourself A LOT of pain. You set up the base implementation to do all the boilerplate logic for you, e.g.
class Node {
public:
Node() {}
virtual ~Node() {}
virtual string OutputAsINI() const { return ""; }
};
class DataNode {
private:
string myName;
string myData;
public:
DataNode(const string& name, const string& val);
~DataNode() {}
string OutputAsINI() const { string out = myName + " = " + myData; return out; }
};
This way I don't have to test anything- I just blindly call "OutputAsINI()". Similar logic for your whole interface will make most of the null tests go away.
Invert the hierarchy. Put the null node at the base:
class Node {
public:
Node() {}
virtual void visit() const {}
};
Then specialize as needed:
template<typename T>
class DataNode : public Node {
public:
DataNode(T x, const Node* l=&Null, const Node* r=&Null)
: left(l), right(r), data(x) {}
virtual void visit() const {
left->visit();
std::cout << data << std::endl;
right->visit();
}
private:
const Node *left, *right;
T data;
static const Node Null;
};
template<typename T>
const Node DataNode<T>::Null = Node();
Sample usage:
int main()
{
DataNode<char> a('A', new DataNode<char>('B'),
new DataNode<char>('C'));
a.visit();
return 0;
}
Output:
$ ./node
B
A
C