Generate Parse Tree in c - c++

The following code is supposed to generate a parse tree of the input expression, but the problem is that the output E,T,F,S (functions used in the code). I want it to be something like:
a+b*c => E*c => E+b*c => a+b*c
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
char next;
void E(void);void T(void);
void S(void);void F(void);
void error(int);void scan(void);
void enter(char);
void leave(char);
void spaces(int);
int level = 0;
//The main should always be very simple
//First scan the string
//second check for end of string reached , if yes success and if not error.
//P ---> E '#'
int main(void){
printf("Input:");
scan(); E();
if (next != '#') error(1);
else printf("***** Successful parse *****\n");
}
//E ---> T {('+'|'-') T}
void E(void){
enter('E');
T();
while (next == '+' || next == '-') {
scan();
T();
}
leave('E');
}
//T ---> S {('*'|'/') S}
void T(void)
{
enter('T'); S();
while (next == '*' || next == '/') {
scan(); S();
}
leave('T');
}
//S ---> F '^' S | F
void S(void)
{
enter('S'); F();
if (next == '^') {
scan(); S();
}
leave('S');
}
//F ---> char | '(' E ')'
void F(void)
{
enter('F');
if (isalpha(next))
{
scan();
}
else if (next == '(') {
scan(); E();
if (next == ')')
scan();
else
error(2);
}
else {
error(3);
}
leave('F');
}
//Scan the entire input
void scan(void){
while (isspace(next = getchar()));
}
void error(int n)
{
printf("\n*** ERROR: %i\n", n);
exit(1);
}
void enter(char name)
{
spaces(level++);
printf("+-%c\n", name);
}
void leave(char name)
{
spaces(--level);
printf("+-%c\n", name);
}
//TO display the parse tree
void spaces(int local_level)
{
while (local_level-- > 0)
printf("| ");
}

Looks like a recursive descent parser. First, work out your grammar by hand. What you are expecting is not what your grammar says. You've got, from your comments,
E ---> T {('+'|'-') T} expression
T ---> S {('*'|'/') S} term
S ---> F '^' S | F subexpression?
F ---> char | '(' E ')' factor
The definitions of E and T put * at a higher precedence than + so there is no way that you will get E*c. If you want that, you'll have to switch the grammar to
E ---> T {('*'|'/') T} expression
T ---> S {('+'|'-') S} term
If you just want the output to include the rest of the expression,
Get the whole line in
Change your scanner or lexer to get the next character from that scanned line. Mark this as the scanned point.
Change your Enter routine to print the the mnemonic as well as the line from the scanned point.

You don't get to choose the parse tree. I guess you don't understand the output, but (again) I guess you'd like
a+b*c => F+b*c => S+b*c => T+b*c => T+F*c => T+S*c => T+S*F => T+S*S => T+T => E
So here are a few questions to help.
what kind of parsing is going on there? bottom up or top down?
what state the parser is in when it prints enter/leave E, or T?
If you answered bottom up to the first question, what does E, T, S, F, F denotes (enter E, enter T, enter S, enter F, leave F)? When you have leave F that means the parser successfully recognised a non-terminal.
Try the input string 1+b*c. What do you get? Why do you get an error after E, T, S, F?
The output you seem to require can be easily produced if you understand what is produced at the moment. Hope this helps.

Related

Simple text file formatter crashes under Linux, but fine in Windows

I've made a simple .acf file to .json file formatter. But for some reason it runs correctly under Windows with GCC using msys2 - But after executing a string insert or replace - it segmentation faults every time.
What it does is convert the below file into a json compatible format. It appends commas after each entry, applies attribute set symbol and puts braces around it.
Save as test.acf:
"AppState"
{
"appid" "730"
"Universe" "1"
"name" "Counter-Strike: Global Offensive"
"StateFlags" "4"
"installdir" "Counter-Strike Global Offensive"
"LastUpdated" "1462547468"
"UpdateResult" "0"
"SizeOnDisk" "14990577143"
"buildid" "1110931"
"LastOwner" "76561198013962068"
"BytesToDownload" "8768"
"BytesDownloaded" "8768"
"AutoUpdateBehavior" "1"
"AllowOtherDownloadsWhileRunning" "0"
"UserConfig"
{
"Language" "english"
}
"MountedDepots"
{
"731" "205709710082221598"
"734" "5169984513691014102"
}
}
Minimal main code with defects triple slashed:
#include <iostream>
#include <fstream>
#include <string>
int main(int argc, char* argv[])
{
file.open("test.acf");
std::string data((std::istreambuf_iterator<char>(file)), (std::istreambuf_iterator<char>()));
int indexQuote = 0;
int index[4];
int insertCommaNext = -1;
string delims = "\"{}"; // It skips between braces and quotes only
std::size_t found = data.find_first_of(delims);
while(found != std::string::npos)
{
int inc = 1; // 0-4 depending on the quote - 0"key1" 2"value3" 4{
char c = data.at(found);
if (c != '"') {
if (c == '}')
insertCommaNext = found + 1; // Record index to insert comma after (following closing brace)
else if (c == '{') {
///data.insert(index[1] + 1, ":");
///inc++;
}
indexQuote = 0;
} else {
if (insertCommaNext != -1) {
///data.insert(insertCommaNext, ",");
///inc++;
insertCommaNext = -1;
}
index[indexQuote] = found;
if (indexQuote == 2) { // Join 'key: value' by placing the comma
///data.replace(index[1] + 1, 1, ":");
} else if (indexQuote == 4) { // Add comma after each key/value entry
indexQuote = 0;
///data.insert(index[3] + 1, ",");
///inc++;
}
indexQuote++;
}
found = data.find_first_of(delims, found + inc);
}
data = "{" + data + "}";
}
If you uncomment any of the triple slashed /// lines - containing an insert/replace, it will crash.
I'm certian the code quality is not great, there's probably better ways to achieve this. Cheers.
The problem is that indexQuote gets higher than 3, so index[indexQuote] = found; goes out of bounds. You have the case below that resets indexQuote to 0, you have to do that before you try to call index[indexQuote].
For reference, I debugged this by adding prints everywhere and printing all the variables until I found where it crashed.

Converting BNF grammar rules into actual C++ functions/code

I'm trying to create a recursive descent parser. So far I have all the foundation set, I just need to properly implement a few functions to enforce the grammar. I thought everything was right, it looks it, but I guess my Aop, Expr, or Term function is doing something wrong. Sometimes the input stream gets cut off and things aren't recognized. I don't see how though.
Is there any site or source that explains this more in depth, with code examples? Everything I've seen is very generic, which is fine, but I'm stuck on implementation.
NOTE: Edit April 17 2016: My functions were pretty much alright and well structured for the context of my program. The problem I was having, and didn't realize, was that at certain instances when I called getToken I "ate up" characters from the input stream. Sometimes this is fine, other times it wasn't and the input stream needed to be reset. So I simply add a small loop in cases where I needed to put back strings char by char. E.G:
if(t.getType() !=Token::EQOP)
{
//cout<<"You know it" << endl;
int size = t.getLexeme().size();
while(size>0)
{
br->putback(t.getLexeme().at(size-1));
size--;
}
return ex;
}
So with that being said, I pretty much was able to edit my program accordingly and everything worked out once I saw what was eating up the characters.
This is the grammar :
Program::= StmtList
StmtList::= Stmt | StmtList
Stmt::= PRINTKW Aop SC | INTKW VAR SC | STRKW VAR SC | Aop SC
Expr::= Expr PLUSOP Term | Expr MINUSOP Term | Term
Term::= Term STAROP Primary | Primary
Primary::= SCONST | ICONST | VAR | LPAREN Aop RPAREN
Here's the main program with all of the functions: http://pastebin.com/qMB8h8vE
The functions that I seem to be having the most trouble with is AssignmentOperator(Aop), Expression(Expr), and Term. I'll list them here.
ParseTree* Aop(istream *br)
{
ParseTree * element = Expr(br);
if(element!=0)
{
if(element->isVariable())
{
Token t= getToken(br);
if(t==Token::EQOP)
{
cout<<"No" << endl;
ParseTree * rhs = Aop(br);
if(rhs==0)
return 0;
else
{
return new AssignOp(element, rhs);
}
}
else
{
return element;
}
}
}
return 0;
}
ParseTree* Expr(istream *br)
{
ParseTree * element = Term(br);
if(element!=0)
{
Token t=getToken(br);
if(t==Token::MINUSOP || t==Token::PLUSOP)
{
if(t==Token::PLUSOP)
{
ParseTree* rhs = Expr(br);
if(rhs==0)
return 0;
else
{
return new AddOp(element, rhs);
}
}
if(t==Token::MINUSOP)
{
ParseTree* rhs = Expr(br);
if(rhs==0)
return 0;
else
{
return new SubtractOp(element, rhs); //or switch the inputs idk
}
}
}
else
{
return element;
}
}
return 0;
}
ParseTree* Term(istream *br)
{
ParseTree *element = Primary(br);
if(element!=0)
{
Token t=getToken(br);
if(t==Token::STAROP)
{
ParseTree* rhs =Term(br);
if(rhs==0)
return 0;
else
{
return new MultiplyOp(element, rhs);
}
}
else
{
return element;
}
}
return 0;
}
In order to write a recusrive descent parser, you need to convert your grammar into LL form, getting rid of left recursion. For the rules
Term::= Term STAROP Primary | Primary
you'll get something like:
Term ::= Primary Term'
Term' ::= epsilon | STAROP Primary Term'
this then turns into a function something like:
ParseTree *Term(istream *br) {
ParseTree *element = Primary(br);
while (element && peekToken(br) == Token::STAROP) {
Token t = getToken(br);
ParseTree *rhs = Primary(br);
if (!rhs) return 0;
element = new MulOp(element, rhs); }
return element;
}
Note that you're going to need a peekToken function to look ahead at the next token without consuming it. Its also possible to use getToken + ungetToken to do the same thing.

sweet.js: transforming occurrences of a repeated token

I want to define a sweet macro that transforms
{ a, b } # o
into
{ o.a, o.b }
My current attempt is
macro (#) {
case infix { { $prop:ident (,) ... } | _ $o } => {
return #{ { $prop: $o.$prop (,) ... } }
}
}
However, this give me
SyntaxError: [patterns] Ellipses level does not match in the template
I suspect I don't really understand how ... works, and may need to somehow loop over the values of $prop and build syntax objects for each and somehow concatenate them, but I'm at a loss as to how to do that.
The problem is the syntax expander thinks you're trying to expand $o.$prop instead of $prop: $o.$prop. Here's the solution:
macro (#) {
rule infix { { $prop:ident (,) ... } | $o:ident } => {
{ $($prop: $o.$prop) (,) ... }
}
}
Notice that I placed the unit of code in a $() block of its own to disambiguate the ellipse expansion.
Example: var x = { a, b } # o; becomes var x = { a: o.a, b: o.b };.

A parser program for the following grammar

Write a parser (both Yacc and Lex files) that uses the following productions and actions:
S -> cSS {print “x”}
S -> a {print “y”}
S -> b {print “z”}
Indicate the string that it will print when the input is cacba.
I am getting this error: when I give input to it, it says valid input and also says syntax error.
My Scanner Code is this
%{
#include "prac.h"
%}
%%
[c] {return C; }
[a] {return A; }
[b] {return B; }
[ \t] ;
\n { return 0; }
. { return yytext[0]; }
%%
int yywrap(void) {
return 1;
}
And my yacc code is this:
%{
#include <stdio.h>
%}
%token A B C
%%
statement: S {printf("Valid Input"); }
;
S: C S S {printf("Print x\n");}
| A {printf("Print y\n");}
| B {printf("Print z\n");}
;
%%
int main()
{
return yyparse();
}
yyerror(char *s)
{
printf("\n%s\n",s);
printf("Invalid Input");
fprintf(stderr,"At line %d %s ",s,yylineno);
}
How can I fix this?
(Comments converted to an answer)
#ChrisDodd wrote:
Best guess -- you're running on windows, so you're getting a \r (carriage return) character before the newline which is causing your error. Try adding \r to the [ \t] pattern to ignore it.
#Cyclone wrote:
Change your fprintf() statement to fprintf(stderr, "At line %d %s", yylineno, s); not that it will solve your problem.
The OP wrote:
You mean I should add \r into \t so the new regex for it will be [\r\t] Am I right ?
#rici wrote:
#chris suggests [ \r\t]. If you have Windows somewhere in the loop, I agree.

C++ Parsing char array as a script file (syntax)

I have made a simple Script reading class in C++ which allows me to read and parse scripts.
Basically there's a FILE class, which then I proceed to open with "fopen".
In functions I proceed to call "fgetc" and "ftell" to parse the script file as needed, note this ain't an interpreter.
Every script file is supposed to follow a syntax, but this is why I'm asking here for a solution.
Here's how a script looks like:
# Script File Comment
USERNAME = "Joe"
PASSWORD = "pw0001"
ACCESSLEVEL = 3
DATABASE = ("localhost",3306,"db","user","password")
Basically I have a few functions:
// This function searches for "variables"
nextToken();
// After I have the variable, e.g: USERNAME, PASSWORD, ACCESSLEVEL or DATABASE
// I proceed to call this function
// This function reads the char array for (,-{}()[]=) these are symbols
readSymbol();
// In a condition I check what "token/variable" I got and proceed to read
// it accordingly
// e.g; for USERNAME I do:
readString(); // reads text inside "
// e.g; for ACCESSLEVEL I do:
readNumber(); // reads digits until the next char ain't a digit
// e.g; for DATABASE I do:
readSymbol(); // (
readString(); // 127.0.0.1
readSymbol(); // ,
readNumber(); // 3306
readSymbol(); // ,
readString(); // db
readSymbol(); // ,
readString(); // user
readSymbol(); // ,
readString(); // password
readSymbol(); // )
I would like to be able to read a variable declaration like this:
DATABASELIST = {"data1","data2","data3"}
or
DATABASELIST = {"data1"}
I could easily do readSymbol and readString to read for 3 different string definitions inside the variable, however this list is supposed to have custom user data, like 5 different strings, or 8 different strings - depends.
And I seriously have no idea how can I do this with the parser I wrote.
Please note that I am basing this in some Pseudo code I took from a scripter for this type of format, I have the pseudo code extracted from IDA, if you would like to see it for better understanding post here
Here's an example of my "readSymbol" function.
READSYMBOL
int TReadScriptFile::readSymbol()
{
int currentData = 0;
int stringStart = -1;
// Check if we can't read anymore
if (end)
return 0;
while (true)
{
// Basically get chars in the script
currentData = fgetc(File);
// Check for end of file
if (currentData == -1)
{
end = true;
break;
}
if (stringStart == -1)
{
if (isdigit(currentData) || isalpha(currentData))
{
printf("TReadScriptFile::readSymbol: Symbol expected\n");
close();
return 0;
}
else if
(
currentData == '=' || currentData == ',' ||
currentData == '(' || currentData == ')' ||
currentData == '{' || currentData == '}' ||
currentData == '>' || currentData == '<' ||
currentData == ':' || currentData == '-'
)
{
#ifdef __DEBUG__
printf("Symbol: %c\n", currentData);
#endif
stringStart = ftell(File);
break;
}
}
}
return 1;
}
NEXTTOKEN
int TReadScriptFile::nextToken()
{
int currentData = 0;
int stringStart = -1;
int stringEnd = -1;
RecursionDepth = -1;
memset(String, 0, 4000);
// Check if we can't read anymore
if (end)
return 0;
while (true)
{
// ** Syntax **
if (isdigit(getNext()) || getNext() == -1)
{
printf("No more tokens left.\n");
end = true;
close();
return 0;
}
// End
// Basically get chars in the script
currentData = fgetc(File);
// Check for end of file
if (currentData == -1)
{
end = true;
break;
}
// Syntax Checking Part, this really isn't needed but w/e
if (stringStart == -1)
{
if (currentData == '=' || isdigit(currentData))
{
printf("TReadScriptFile::nextToken: Syntax Error: string expected\n");
close();
return 0;
}
}
// End Syntax Checking
// It's a comment line, we should skip
if (currentData == '#')
{
seekNewLn();
continue;
}
// There are no variables, yet
if (stringStart == -1)
{
// We found a letter, we are near a token!
if (isalpha(currentData))
{
stringStart = ftell(File);
// We might as well add the letter to the string
RecursionDepth++;
String[RecursionDepth] = currentData;
continue;
}
}
else if (stringStart != -1)
{
// Let's wait until we get an identifier or space
// We found a digit, error
if (isdigit(currentData))
{
printf("TReadScriptFile::nextToken: string expected\n");
close();
return 0;
}
// We found a space, maybe we should stop looking for tokens?
else if (isspace(currentData))
{
#ifdef __DEBUG__
printf("Token: %s\n", String);
#endif
break;
}
RecursionDepth++;
String[RecursionDepth] = currentData;
}
}
return 1;
}
I found a good example of the approach I followed here:
http://llvm.org/docs/tutorial/LangImpl1.html
One mechanism to deal with DATABASE_LIST would be this:
After finding the variable DATABASE_LIST read a symbol using readSymbol() checking if it is a { then in a loop do readString() add it to a std::vector (or some other suitable container) then check for a , or } (using readSymbol()) . If it is a ,(comma) then you go back and read another string add to the vector etc. until you do finally reach } . When you are finished you'd have a vector (dynamic array) of strings that represent a DATABASE_LIST