Parsing Logic Expressions with Pyparsing - python-2.7

To start of, this is not homework, I'm trying to learn pyparsing and I got stuck here.
My question is as follows, I'm trying to parse statements like (abc or def) or def)
My program goes to shit on an infix expression a or b, since both sides can be expressions themselves, which can again be infix expressions, the parser recurses until recursion depth is reached and no work gets done.
Code below:
# infix operators are automatically created and dealt with
infix_operators = ['and', '&', 'or', '|', 'implies', '->']
variable = Word(alphas)
infix_op = oneOf(infix_operators, caseless=True)
expr = Forward()
infix_expr = (expr + infix_op + expr)
complex_expr = nestedExpr('(', ')', content=expr)
expr << (infix_expr | complex_expr | variable)
print str(expr.parseString("(abc or def) or def)")[0])
My question is fairly simple; how would one go about avoiding an infinite loop in these kinds of situations?

The canonical solution is something that implements this BNF:
atom := variable | 'True' | 'False' | '(' expr ')'
factor := [ 'not' ]... atom
term := factor [ '&' factor ]...
expr := term [ '|' term ]...
The left-recursion problem is addressed because, even though expr eventually recurses through term -> factor -> atom, when it gets to expr, it first has to parse a leading '('. So an expr never has to first parse a deeper expr before parsing some other elements first.
This BNF translates almost directly to pyparsing as:
and_ = Keyword('and')
or_ = Keyword('or')
not_ = Keyword('not')
true_ = Keyword('true')
false_ = Keyword('false')
not_op = not_ | '~'
and_op = and_ | '&'
or_op = or_ | '|'
expr = Forward()
identifier = ~(and_ | or_ | not_ | true_ | false_) + Word(alphas)
atom = identifier | Group('(' + expr + ')')
factor = Group(ZeroOrMore(not_op) + atom)
term = Group(factor + ZeroOrMore(and_op + factor))
expr <<= Group(term + ZeroOrMore(or_op + term))
Or you can use pyparsing's infixNotation helper:
expr = infixNotation(true_ | false_ | identifier,
[
(not_op, 1, opAssoc.RIGHT),
(and_op, 2, opAssoc.LEFT),
(or_op, 2, opAssoc.LEFT),
])
infixNotation is constructed with a base operand (in this case, either an alpha variable name or one of the boolean literals true or false), followed by a list of (operator, arity, associativity) tuples, given in order of operator precedence. infixNotation takes care of all the recursion definitions, the parsing of right-associative vs. left-associative operators, and also does some lookahead for operators, to avoid extra nesting of operations for a given precedence level if there are no operators.
You can test this expression using pyparsing's runTests method:
expr.runTests("""
p and not q
not p or p
r and (p or q)
r and p or q
not q
q
""", fullDump=False)
Giving:
p and not q
[['p', 'and', ['not', 'q']]]
not p or p
[[['not', 'p'], 'or', 'p']]
r and (p or q)
[['r', 'and', ['p', 'or', 'q']]]
r and p or q
[[['r', 'and', 'p'], 'or', 'q']]
not q
[['not', 'q']]
q
['q']

Related

Need help checking if my CFG question is correct

I need to write a CFG for the language below:
L = { w ∈ Σ* | w is a regular expression over a binary alphabet }
I came up with:
S = SΣ* | ε
X = S | ε
I'm starting the discipline now, and if it isn't correct, i would apreciate an explanation.
Many thanks!
PS.: This is not part from a homework, it is just an exercise from a Computer Science Theory book.
ANSWER BASED ON Mo B. POST
S = e | {0,1}
X = S | S* | Z
Y = X | YX
Z = Y | Z | Y
SECOND ANSWER BASED ON Mo B. POST
S = e | {0,1}
X = S | S'*' | '(' Z ')'
Y = X | YX
Z = Y | Z '|' Y
No, this is not correct. The language is the set of regular expressions (which itself isn't regular, but it's luckily context-free), so you need to come up with a context-free grammar for regular expressions. (First, make sure you know the formal definition for regular expressions.)
The meta-alphabet Σ has not been defined in your question, but it only works if Σ at least contains the mentioned binary alphabet, say 0 and 1, and the symbols ε, *, |, ( and ).
Here is one way of doing it:
Basic ::= 'ε' // Note that this is the symbol 'ε' which is not the same as ε
| c // for c in {0, 1} or whatever the binary alphabet is
Star ::= Basic
| Star '*'
| '(' Regular ')'
Concat ::= Star
| Concat Star
Regular ::= Concat
| Regular '|' Concat

What is causing this type error in OCaml?

type expr = NUM of int
| PLUS of expr * expr
| MINUS of expr * expr
let rec calc expr1 =
match expr1 with
| NUM i -> NUM i
| PLUS (lexpr1, rexpr1) ->
(match lexpr1, rexpr1 with
| (*(NUM li1,NUM ri1) -> NUM li1+ri1*)
| (lexpr1', rexpr1') -> PLUS (calc lexpr1', calc rexpr1'))
It says
Error: This expression has type expr but an expression was expected of type int
I don't know why errors keep coming out
Ocaml in the line
NUM li1+ri1
first creates a NUM and then tries to add ri1. But since + operator applies only on ints and not at expr and int, it throws an error.
To fix this add ints first:
NUM (li1+ri1)

Function that evaluates mathematical expression

I need to create a function that returns the result of calculating expressions of a created type:
type expr =
VarX
| VarY
| Sine of expr
| Cosine of expr
| Average of expr * expr
| Times of expr * expr
| Thresh of expr * expr * expr * expr
here is what the function is supposed to do :
eval : expr * float * float -> float that takes an triple (e,x,y) and evaluates the expression e at the point x,y . x and y are going to be between -1 and 1 and the result of the function will be in the same interval. Once you have implemented the function, you should get the following behavior at the OCaml prompt:
# eval (Sine(Average(VarX,VarY)),0.5,-0.5);;
- :float = 0.0
Here is the function that I wrote it needs to return a float:
let rec eval (e,x,y) = match e with
VarX -> x*1.0
|VarY ->y*1.0
|Sine expr -> sin(pi* eval (expr,x,y))
|Cosine expr -> cos( pi* eval (expr,x,y))
|Average (expr1, expr2) -> (eval (expr1,x,y)+eval (expr2,x,y))/2.0
|Times (expr1, expr2) -> (eval (expr1,x,y))*(eval (expr2,x,y))
|Thresh (expr1, expr2, expr3, expr4) ->if((eval (expr1,x,y))<(eval (expr2,x,y)))then (eval (expr3,x,y)) else(eval (expr4,x,y));;
this is the error that i get,
Error: This expression has type float but an expression was expected of type
int
how do i make the function return a float, i know it needs to think that every return value for each case needs to be a float but im not sure how to do that
The operator for multiplying two floats is *.. The operator for adding two floats is +.. And, the operator for dividing two floats is /.

Changing an expression to string

I need to convert an arithmetic sequence that uses
this type:
type expr =
VarX
| VarY
| Sine of expr
| Cosine of expr
| Average of expr * expr
| Times of expr * expr
| Thresh of expr * expr * expr * expr
here are the definitions for all the things in it:
e ::= x | y | sin (pi*e) | cos (pi*e) | ((e + e)/2) | e * e | (e<e ? e : e)
need to convert something like this:
exprToString (Thresh(VarX,VarY,VarX,(Times(Sine(VarX),Cosine(Average(VarX,VarY))))));;
to this:
string = "(x<y?x:sin(pi*x)*cos(pi*((x+y)/2)))"
I know I have to do this recursively by matching each expr with its appropriate string but Im not sure where the function begins matching or how to recurse through it. Any help or clues would be appreciated
Here is a simplified version of what you probably want:
type expr =
| VarX
| Sine of expr
let rec exprToString = function
| VarX -> "x"
| Sine e -> "sin(" ^ exprToString e ^ ")"
let () = print_endline (exprToString (Sine (Sine (Sine (Sine VarX)))))
It recurses over the AST nodes and create the string representation of the input by concatenating the string representations of the nodes.
This approach may not work nicely for bigger real world examples since:
String concatenation (^) creates a new string from two, this is slower than using some more appropriate data structure such as Buffer.t
Too many parentheses, ex, (2*(2*(2*2))), not 2*2*2*2. If you want minimize the number of parentheses, your algorithm must be aware of operator precedence and connectivity.

Construct finite automata

Would you agree that in the regular expression:
((a|b)*(e|c)*)*
is any combination of a,b, and c's? Or would you say c always comes after a and b.
Through I always prefer to describe Regular Expressions RE semantically. But there is also a rule, one of – "distributed law", that is very helpful to write cleanup and optimized RE:
(P | Q)* == (P*Q*)* == (P* | Q*)*
Note: | is union operation and P | Q is same as P | Q. Here P, Q are regular expressions.
So you expression:
((a|b)*(e|c)*)* # P = (a|b)* and Q = (e|c)*
=> ((a|b) | (e|c))* # (P* | Q*)* = (P | Q)*
As I said in union order is not important, so here ( ) are redundant. and
((a|b) | (e|c))*
=> (a | b | c | e)*
Now * means repetition any number of times of some pattern on which * is applied. Here in above expression * is applied on a | b | c | e, and in each iteration you can pick any one symbol, That means any symbol an be appear after any other symbol in regular expression – that means any combination of 'a', 'b', 'c', 'e' is possible.
And its FA is very simple: consist of single state Q0 with a self loop labeled all four symbols. as follows:
__
|| a, b, c, e
▼|
––►((Q0))
Q0 is both initial and final state