Working on Parser with Binary and unary node?

Working on Parser with Binary and unary node? - c++

I now this is bit weird title but I hope you will understand what I am asking about. Few months back I worked on Interpreter program in python and that was kind of great but now I want to implement same in C++ but doing so is offering great problems to me as C++ is type strict.
Lets start of from what I did in my python program. First I created a Lexer that would separate everything into tokens (key-value pair) and I wrote a Parser which will convert a arithmetic grammar into Operation Nodes as BinaryOpNode, UnaryOpNode, and NumberNode. ex- (-2+7)^3 will be converted into AST as a Binary Node having left node as another Binary Node, operator as POW(power) and right node as Number node of 3. Left Node of this node is Binary Node whose Left Node is Unary Node (MINUS and a Number Node 2), opeartor as PLUS and Right Node as Number Node 7.
I did this by identifying expression, term and factor. I have wrote a Lexer in C++ but having problem in Parser. Please help me to do same in C++.
What I have done so Far??
I tried something weird but kind of working. I created a class BinaryOpNode with two void* members for Right and left Node, A Enum member for operation between Rt and Lt node. Now two another boolean members for both nodes which would help to now what type of void* Lt and Rt are? Are they UnaryOpNode or BinaryOpNode(default). This will help me to typecast the Node into respective types.
However I am not satisfied with my results as they look like less optimized and also I can't keep track of NumberNode this way.
Please Help me. THANKS IN ADVANCE

What you are looking for is polymorphism. That is, code that a programmer writes, and does different things depending on the types of the things it operates on.
C++ supports a bewildering array of ways to do polymorphism.
The most supported kind is inheritance based virtual polymorphism. In this, you create a base class:
struct INode {
virtual ~INode() {}
};
and add in common operations to it, making those common operations pure-virtual:
struct INode {
virtual ~INode() {}
virtual std::vector<INode*> GetChildren() const = 0;
};
This requires that you work with pointers instead of object instances.
In this system, if you know the type of an object, you can use dynamic_cast<RealType*>(iNodePointer) to get a pointer to the object as an instance of that type. It returns nullptr if the types don't match. This lets you access the methods you have in the descended type that aren't in the base interface.
A second kind of polymorphism is std::variant based. This is a closed set of types, which parsers often have.
using AnyNode = std::variant<Node::BinaryOp, Node::UnaryOp, Node::Number>;
here you use std::visit to operate on the concrete type instead of dynamic_cast, and your parse tree is value-based instead of pointer-based.
There is some pain when you want a node to have inside itself a vector of AnyNode.
A third way is std::function type-erasure style. Here you write your own polymorphic system that takes objects of arbitrary type and wraps their operations up in a value-semantics wrapper.
A forth option is CRTP static polymorphism. This isn't suitable to build a dynamic parse tree, but it can be used to help implement some of the above.
A fifth option is aspect oriented std::function operation bundles.
A sixth option is manual function table tweaking, basically reimplementing the C++ vtable solution manually as if you are in C, but in C++. This can permit you to have features similar to other OO-languages.
A seventh option is to write up a signals-slots system and send messages to your objects.
There are almost certainly more.
The easiest solution is probably to first learn about inheritance and virtual functions in C++ (the first option above). I personally would probably write a parse tree using std::variant at this point, but if you probably don't know enough C++ at this point to practically do that.

Related

c++: why not use friend for compositions?

I'm a computational physicist trying to learn how to code properly. I've written several program by now, but the following canonical example keeps coming back, and I'm unsure as to how to handle it. Let's say that I have a composition of two objects such as
class node
{
int position;
};
class lattice
{
vector <node*> nodes;
double distance (node*,node*);
};
Now, this will not work, because position is a private member of node. I know of two ways to solve this: either you create an accessor such as getpos(){return position}, or make lattice a friend of node.
The second of these solutions seems a lot easier to me. However, I am under the impression that it is considered slightly bad practice, and that one generally ought to stick to accessors and avoid friend. My question is this: When should I use accessors, and when should I use friendship for compositions such as these?
Also, a bonus question that has been bugging me for some time: Why are compositions preferred to subclasses in the first place? To my understanding the HAS-A mnemonic argues this, but, it seems more intuitive to me to imagine a lattice as an object that has an object called node. That would then be an object inside of an object, e.i. a subclass?

Friend is better suited if you give access rights to only specific classes, rather than to all. If you define getpos(){return position}, position information will be publicly accessible via that getter method. If you use friend keyword, on the other hand, only the lattice class will be able to access position info. Therefore, it is purely dependent on your design decisions, whether you wanna make the information publicly accessible or not.

You made a "quasi class", this a textbook example of how not to do OOP because changing position doesn't change anything else in node. Even if changing position would change something in node, I would rethink the structure to avoid complexity and improve the compiler's ability to optimize your code.
I’ve witnessed C++ and Java programmers routinely churning out such
classes according to a sort of mental template. When I ask them to
explain their design, they often insist that this is some sort of
“canonical form” that all elementary and composite item (i.e.
non-container) classes are supposed to take, but they’re at a loss to
explain what it accomplishes. They sometimes claim that we need the
get and set functions because the member data are private, and, of
course, the member data have to be private so that they can be changed
without affecting other programs!
Should read:
struct node
{
int position;
};

Not all classes have to have private data members at all. If your intention is to create a new data type, then it may be perfectly reasonable for position to just be a public member. For instance, if you were creating a type of "3D Vectors", that is essentially nothing but a 3-tuple of numeric data types. It doesn't benefit from hiding its data members since its constructor and accessor methods have no fewer degrees of freedom than its internal state does, and there is no internal state that can be considered invalid.
template<class T>
struct Vector3 {
T x;
T y;
T z;
};
Writing that would be perfectly acceptable - plus overloads for various operators and other functions for normalizing, taking the magnitude, and so on.
If a node has no illegal position value, but no two nodes in a lattice cannot have the same position or some other constraint, then it might make sense for node to have public member position, while lattice has private member nodes.
Generally, when you are constructing "algebraic data types" like the Vector3<T> example, you use struct (or class with public) when you are creating product types, i.e. logical ANDs between other existent types, and you use std::variant when you are creating sum types, i.e. logical ORs between existent types. (And for completeness' sake, function types then take the place of logical implications.)
Compositions are preferred over inheritance when, like you say, the relationship is a "has-a" relationship. Inheritance is best used when you are trying to extend or link with some legacy code, I believe. It was previously also used as a poor approximation of sum types, before std::variant existed, because the union keyword really doesn't work very well. However, you are almost always better off using composition.

Concerning your example code, I am not sure that this poses a composition. In a composition, the child object does not exist as an independent entity. As a rule of thumb, it's life time is coupled with the container. Since you are using a vector<node*> nodes, I assume that the nodes are created somewhere else and lattice only has a pointer to these objects. An example for a composition would be
class lattice {
node n1; // a single object
std::vector<node> manyNodes;
};
Now, addressing the questions:
"When should I use accessors, and when should I use friendship for compositions such as these?"
If you use plenty of accessors in your code, your are creating structs and not classes in an OO sense. In general, I would argue that besides certain prominent exceptions such as container classes one rarely needs setters at all. The same can be argued for simple getters for plain members, except when the returning the property is a real part of the class interface, e.g. the number of elements in a container. Your interface should provide meaningful services that manipulate the internal data of your object. If you frequently get some internal data with a getter, then compute something and set it with an accessor you should put this computation in a method.
One of the main reasons why to avoid ´friend´ is because it introduces a very strong coupling between two components. The guideline here is "low coupling, high cohesion". Strong coupling is considered a problem because it makes code hard to change, and most time on software projects is spent in maintenance or evoluation. Friend is especially problematic because it allows unrelated code to be based on internal properties of your class, which can break encapsulation. There are valid use-cases for ´friend´ when the classes form a strongly related cluster (aka high cohesion).
"Why are compositions preferred to subclasses in the first place?"
In general, you should prefer plain composition over inheritance and friend classes since it reduces coupling. In a composition, the container class can only access the public interface of the contained class and has no knowledge about the internal data.
From a pure OOP point of view, your design has some weaknesses and is probably not very OO. One of the basic principles of OOP is encapsulation which means to couple related data and behavior into objects. The node class e.g. does not have any meaning other than storing a position, so it does not have any behavior. It seems that you modeled the data of your code but not the behavior. This can be a very appropriate design and lead to good code, but it not really object-oriented.
"To my understanding the HAS-A mnemonic argues this, but, it seems more intuitive to me to imagine a lattice as an object that has an object called node. That would then be an object inside of an object, e.i. a subclass?"
I think you got this wrong. Public inheritance models an is-a-relationship.
class A: public B {};
It basically says that objects of class A are a special kind of B, fulfilling all the assumptions that you can make about objects of type B. This is known as the Liskov substitution principle. It basically says that everywhere in your code where you use a B you should be able to also use an A. Considering this, class lattice: public node would mean that every lattice is a node. On the other hand,
class lattice {
int x;
node n;
int y;
};
means that an object of type lattice contains another object of type node, in C++ physically placed together with x and y. This is a has-a-relationship.

Recursive data-structures without the use of pointers

During my bachelor degree in CS I've come across the use of recursive data-structures a lot of times. In C++ I always ended up using pointers to make my data structures recursive, just like what I would do in C.
A simplified example could be the following:
struct Tree{
int data;
struct Tree *left, *right;
};
However, using pointers tends to be a risky job and involves a lot hours debugging and testing the code. For these resouns I would like to know if there is any other efficient way of defining recursive data-structures in C++.
In other programming languages, like Rust, I've seen things like that:
struct Node {
children: Vec<Node>,
node_type: NodeType,
}
Is there a safer and confortable way of defining such recursive structures in C++. One possibility would be to use std::Vector, but I am not aware of the performance of the method.

The reason pointers are used rather than values is because you would never be able to define your struct as its size would be infinitely recursive.
struct Tree{
int data;
struct Tree left, right;
};
Neglecting padding etc, you could approximate the size of Tree as
sizeof(Tree) == sizeof(int) + sizeof(Tree) + sizeof(Tree)
// ^data ^left ^right
but you can see that since Tree has two members of Tree, and those members themselves have two Tree members, and those have two Tree members.... you can see where this is going.

The Rust example uses a vector of children - this can be empty as well.
In C++, the member variable can be an object, a pointer or a reference (omitted for simplicity).
Since a node object cannot be used directly (this would loop infinitely) and you do not wish to use a pointer, your options are:
use a vector as well (though for a binary tree this is not the most convenient type - you could however limit it in code to always two elements),
use a map (key could be an enum CHILD_LEFT, CHILD_RIGHT),
reconsider using pointers, or better yet: smart pointers (this looks like a good use case for regular unique_ptrs).

Erlang: C++ bindings state of the art?

I'm evaluating binding a C++ project of mine in Erlang. My project uses templates and method overloading massively, so it's not uncommon to have something like this:
template <typename T, class Iterator = BufferIterator<T> >
class Buffer
{
public:
[...]
private:
[...]
};
I've heard that the computational model in Erlang is a bit different from "traditional" programming languages. In Erlang a node seems to be a first class component with messages that flows from a Node from another. In this scenario, is possible, for example: "This is a list of int. Send it to the C++ node, which will convert it into a Buffer<int> object, perform some operations on it, and them convert back the result into a new Erlang list"?
I've seen on the web some project like tinch++, it seems promising but not stable at all.
Every kind of tips, link or code snippet would be very appriciated.
Thanks in advance, A.

See the Interoperability Tutorial.
For interfacing with C and C++, you don't need to create a node. Port drivers or NIFs (natively implemented functions) may be a better choice. At any rate, your C++ node/port driver/NIF will receive messages/arguments from Erlang as a specific data structure: ETERM, ErlDrvTerm, or ERL_NIF_TERM. Then you check what the term looks like (e.g. if it's a list of ints), and can convert it to whatever you need.

Is "node" an ADT? If so, what is its interface?

Nodes are useful for implementing ADTs, but is "node" itself an ADT? How does one implement "node"? Wikipedia uses a plain old struct with no methods in its (brief) article on nodes. I googled node to try and find an exhaustive article on them, but mostly I found articles discussing more complex data types implemented with nodes.
Just what is a node? Should a node have methods for linking to other nodes, or should that be left to whatever owns the nodes? Should a node even be its own standalone class? Or is it enough to include it as an inner struct or inner class? Are they too general to even have this discussion?

A node is an incredibly generic term. Essentially, a node is a vertex in a graph - or a point in a network.
In relation to data structures, a node usually means a single basic unit of data which is (usually) connected to other units, forming a larger data structure. A simple data structure which demonstrates this is a linked list. A linked list is merely a chain of nodes, where each node is linked (via a pointer) to the following node. The end node has a null pointer.
Nodes can form more complex structures, such as a graph, where any single node may be connected to any number of other nodes, or a tree where each node has two or more child nodes. Note that any data structure consisting of one or more connected nodes is a graph. (A linked list and a tree are both also graphs.)
In terms of mapping the concept of a "node" to Object Oriented concepts like classes, in C++ it is usually customary to have a Data Structure class (sometimes known as a Container), which will internally do all the work on individual nodes. For example, you might have a class called LinkedList. The LinkedList class then would have an internally defined (nested) class representing an individual Node, such as LinkedList::Node.
In some more cruder implementations you may also see a Node itself as the only way to access the data structure. You then have a set of functions which operate on nodes. However, this is more commonly seen in C programs. For example, you might have a struct LinkedListNode, which is then passed to functions like void LinkedListInsert(struct LinkedListNode* n, Object somethingToInsert);
In my opinion, the Object Oriented approach is superior, because it better hides details of implementation from the user.

Generally you want to leave node operations to whatever ADT owns them. For example a list should have the ability to traverse its own nodes. It doesn't need to the node to have that ability.
Think of the node as a simple bit of data that the ADT holds.

In the strictest terms, any assemblage of one or more primitive types into some kind of bundle, usually with member functions to operate on the data, is an Abstract Data Type.
The grey area largely comes from which language you operate under. For example, in Python, some coders consider the list to be a primitive type, and thus not an ADT. But in C++, the STL List is definitely an ADT. Many would consider the STL string to be an ADT, but in C# it's definitely a primitive.
To answer your question more directly: Any time you are defining a data structure, be it struct or class, with or without methods, it is necessarily an ADT because you are abstracting primitive data types into some kind of construct for which you have another purpose.

An ADT isn't a real type. That's why it's called an ADT. Is 'node' an ADT? Not really, IMO. It can be a part of one, such as a linked list ADT. Is 'this node I just created to contain thingys' an ADT? Absolutely not! It's, at best, an example of an implementation of an ADT.
There's really only one case in which ADT's can be shown expressed as code, and that's as templated classes. For example, std::list from the C++ STL is an actual ADT and not just an example of an instance of one. On the other hand, std::list<thingy> is an example of an instance of an ADT.
Some might say that a list that can contain anything that obeys some interface is also an ADT. I would mildly disagree with them. It's an example of an implementation of an ADT which can contain a wide variety of objects that all have to obey a specific interface.
A similar argument could be made about the requirements of the std::list's "Concepts". For instance that type T must be copyable. I would counter that by saying that these are simply requirements of the ADT itself while the previous version actually requires a specific identity. Concepts are higher level than interfaces.
Really, an ADT is quite similar to a "pattern" except that with ADT's we're talking about algorithms, big O, etc... With patterns we're talking about abstraction, reuse, etc... In other words, patterns are a way to build something that's implementations solve a particular type of problem and can be extended/reused. An ADT is a way to build an object that can be manipulated through algorithms but isn't exactly extensible.

Nodes are a detail of implementing the higher class. Nodes don't exist or operate on their own- they only exist because of the need for separate lifetimes and memory management than the initial, say, linked list, class. As such, they don't really define themselves as their own type, but happily exist with no encapsulation from the owning class, if their existence is effectively encapsulated from the user. Nodes typically also don't display polymorphism or other OO behaviours.
Generally speaking, if the node doesn't feature in the public or protected interface of the class, then don't bother, just make them structs.

In the context of ADT a node is the data you wish to store in the data structure, plus some plumbing metadata necessary for the data structure to maintain its integrity. No, a node is not an ADT. A good design of an ADT library will avoid inheritance here because there is really no need for it.
I suggest you read the code of std::map in your compiler's standard C++ library to see how its done properly. Granted, you will probably not see an ADT tree but a Red-Black tree, but the node struct should be the same. In particular, you will likely see a lightweight struct that remains private to the data structure and consisting of little other than data.

You're mixing in three mostly orthogonal concepts in your question: C++, nodes, ADTs.
I don't think it's useful to try to sort out what can be said in general about the intersection of those concepts.
However, things can be said about e.g. singly linked list nodes in C++.
#include <iostream>
template< class Payload >
struct Node
{
Node* next;
Payload value;
Node(): next( 0 ) {}
Node( Payload const& v ): next( 0 ), value( v ) {}
void linkInFrom( Node*& aNextPointer )
{
next = aNextPointer;
aNextPointer = this;
}
static Node* unlinked( Node*& aNextPointer)
{
Node* const result = aNextPointer;
aNextPointer = result->next;
return result;
}
};
int main()
{
using namespace std;
typedef Node<int> IntNode;
IntNode* pFirstNode = 0;
(new IntNode( 1 ))->linkInFrom( pFirstNode );
(new IntNode( 2 ))->linkInFrom( pFirstNode );
(new IntNode( 3 ))->linkInFrom( pFirstNode );
for( IntNode const* p = pFirstNode; p != 0; p = p->next )
{
cout << p->value << endl;
}
while( pFirstNode != 0 )
{
delete IntNode::unlinked( pFirstNode );
}
}
I first wrote these operations in Pascal, very early eighties.
It continually surprises me how little known they are. :-)
Cheers & hth.,

Structure for hierarchal Component storage

I've been batting this problem around in my head for a few days now and haven't come to any satisfactory conclusions so I figured I would ask the SO crew for their opinion. For a game that I'm working on I'm using a Component Object Model as described here and here. It's actually going fairly well but my current storage solution is turning out to be limiting (I can only request components by their class name or an arbitrary "family" name). What I would like is the ability to request a given type and iterate through all components of that type or any type derived from it.
In considering this I've first implemented a simple RTTI scheme that stores the base class type through the derived type in that order. This means that the RTTI for, say, a sprite would be: component::renderable::sprite. This allows me to compare types easily to see if type A is derived from type B simply by comparing the all elements of B: i.e. component::renderable::sprite is derived from component::renderable but not component::timer. Simple, effective, and already implemented.
What I want now is a way to store the components in a way that represents that hierarchy. The first thing that comes to mind is a tree using the types as nodes, like so:
component
/ \
timer renderable
/ / \
shotTimer sprite particle
At each node I would store a list of all components of that type. That way requesting the "component::renderable" node will give me access to all renderable components regardless of derived type. The rub is that I want to be able to access those components with an iterator, so that I could do something like this:
for_each(renderable.begin(), renderable.end(), renderFunc);
and have that iterate over the entire tree from renderable down. I have this pretty much working using a really ugly map/vector/tree node structure and an custom forward iterator that tracks a node stack of where I've been. All the while implementing, though, I felt that there must be a better, clearer way... I just can't think of one :(
So the question is: Am I over-complicating this needlessly? Is there some obvious simplification I'm missing, or pre-existing structure I should be using? Or is this just inheritly a complex problem and I'm probably doing just fine already?
Thanks for any input you have!

You should think about how often you need to do the following:
traverse the tree
add/remove elements from the tree
how many objects do you need to keep track of
Which is more frequent will help determine the optimum solution
Perhaps instead of make a complex tree, just have a list of all types and add a pointer to the object for each type it is derived from. Something like this:
map<string,set<componenet *>> myTypeList
Then for an object that is of type component::renderable::sprite
myTypeList["component"].insert(&object);
myTypeList["renderable"].insert(&object);
myTypeList["sprite"].insert(&object);
By registering each obejct in multiple lists, it then becomes easy to do something to all object of a given type and subtypes
for_each(myTypeList["renderable"].begin(),myTypeList["renderable"].end(),renderFunc);
Note that std::set and my std::map construct may not be the optimum choice, depending on how you will use it.
Or perhaps a hybrid approach storing only the class heirarchy in the tree
map<string, set<string> > myTypeList;
map<string, set<component *> myObjectList;
myTypeList["component"].insert("component");
myTypeList["component"].insert("renderable");
myTypeList["component"].insert("sprite");
myTypeList["renderable"].insert("renderable");
myTypeList["renderable"].insert("sprite");
myTypeList["sprite"].insert("sprite");
// this isn't quite right, but you get the idea
struct doForList {
UnaryFunction f;
doForList(UnaryFunction f): func(f) {};
operator ()(string typename) {
for_each(myTypeList[typename].begin();myTypeList[typename].end(), func);
}
}
for_each(myTypeList["renderable"].begin(),myTypeList["renderable"].end(), doForList(myFunc))

The answer depends on the order you need them in. You pretty much have a choice of preorder, postorder, and inorder. Thus have obvious analogues in breadth first and depth first search, and in general you'll have trouble beating them.
Now, if you constraint the problem a litle, there are a number of old fashioned algorithms for storing trees of arbitrary data as arrays. We used them a lot in the FORTRAN days. One of them had the key trick being to store the children of A, say A2 and A3, at index(A)*2,index(A)*2+1. The problem is that if your tree is sparse you waste space, and the size of your tree is limited by the array size. But, if I remember this right, you get the elements in breadth-first order by simple DO loop.
Have a look at Knuth Volume 3, there is a TON of that stuff in there.

If you want to see code for an existing implementation, the Game Programming Gems 5 article referenced in the Cowboy Programming page comes with a somewhat stripped down version of the code we used for our component system (I did a fair chunk of the design and implementation of the system described in that article).
I'd need to go back and recheck the code, which I can't do right now, we didn't represent things in a hierarchy in the way you show. Although components lived in a class hierarchy in code, the runtime representation was a flat list. Components just declared a list of interfaces that they implemented. The user could query for interfaces or concrete types.
So, in your example, Sprite and Particle would declare that they implemented the RENDERABLE interface, and if we wanted to do something to all renderables, we'd just loop through the list of active components and check each one. Not terribly efficient on the face of it, but it was fine in practice. The main reason it wasn't an issue was that it actually turns out to not be a very common operation. Things like renderables, for example, added themselves to the render scene at creation, so the global scene manager maintained its own list of renderable objects and never needed to query the component system for them. Similarly with phyics and collision components and that sort of thing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js