I have been presented the problem of reading an input file that contains logic statements, and am required to construct a Truth Table to determine whether an ASK matches any/all models that are determined. An example of some data I may expect to read in is:
(p & z => x) => ((p | d) & z)
Please don't get too caught up in the example and whether it really makes sense, I just made it up to show the different compositions I may be presented with. Multiple such statements can be separated with semicolons.
I have sorted out the semicolon splitting without any dramas, and now have a vector of strings containing each separate statement, where each string is as presented above. Now without parenthesis being involved, I believe determining the statements would be rather straight forward, but with their involvement I am now required to compute different sections before others. EG:
(p | d) = result and then (result & x)
I have seen people discussing the concept of using a stack to determine if open brackets are closed properly, but I do not believe that this would be appropriate for my situation as this would not allow me to determine what statements were inside what set of parenthesis.
The current idea I have is to use the stack idea, and try to determine the "depth" of a statement (essentially how far it is nested) and then mark this number with each statement, but I believe this sounds like an inelegant solution. Does anyone have any tips as to how I should construct an algorithm to properly address the problem?
You need to build a tree of the expressions, where your variables are leaves.
Your expression will then become:
=>
/ \
/ \
/ \
=> &
/ \ / \
& X | Z
/ \ / \
p z P D
Once you have built this kind of representation, the evaluation is straightforward.
Another approach, with the same result, is reducing your expression to something like RPN (where you can use your stack idea):
P, Z, &, X, =>, P, D, |, Z, &, =>
As suggested in the comments, you can do it with the shunting yard algorithm.
Related
I'm trying to write an expression parser. One part I'm stuck on is breaking down an expression into blocks via its appropriate order of precedence.
I found the order of precedence for C++ operators here. But where exactly do I split the expression based on this?
I have to assume the worst of the user. Here's a really messy over-exaggerated test example:
if (test(s[4]) < 4 && b + 3 < r && a!=b && ((c | e) == (g | e)) ||
r % 7 < 4 * givemeanobj(a & c & e, b, hello(c)).method())
Perhaps it doesn't even evaluate, and if it doesn't I still need to break it down to determine that.
It should break down into blocks of singles and pairs connected by operators. Essentially it breaks down into a tree-structure where the branches are the groupings, and each node has two branches.
Following the order of precedence the first thing to do would be to evaluate the givemeanobj(), however that's an easy one to see. The next would be the multiplication sign. Does that split everything before the * into a separate , or just the 4? 4 * givemeanobj comes before the <, right? So that's the first grouping?
Is there a straightforward rule to follow for this?
Is there a straightforward rule to follow for this?
Yes, use a parser generator such as ANTLR. You write your language specification formally, and it will generate code which parses all valid expressions (and no invalid ones). ANTLR is nice in that it can give you an abstract syntax tree which you can easily traverse and evaluate.
Or, if the language you are parsing is actually C++, use Clang, which is a proper compiler and happens to be usable as a library as well.
I am interested in detecting redundant parentheses in OCaml code. Some ideas I have tried with no results include using regular expressions, comparing reverse code generated from AST. I am lost on how to proceed with this task.
There is a simple solution. Parse the code (using compilerlibs) and then print it back (again using compiler libs) and compare the results. The compilerlibs pretty printer will not put any redundant parentheses. To make the comparison easier, you can remove all spaces, or just count the number of parentheses.
There are less heave and more adhocy approaches, e.g., to catch the common misuse of parentheses:
f(x) instead of f x
(f x) * (f y), instead of f x * f y where * is an arbitrary infix operator.
Finally, the general approach, in case if you need for the student project. Would be to compare operators precedence and mark operators that have higher precedence (bind tighter) but still have parenthesis, e.g., (x * y) + z, here * has higher precedence than + but is still delimited with parentheses.
In my computational theory class we have been asked to prove that a certain language is regular:
Ln = { a^(2k+1) | k is a multiple of n } c { a } *
I'm unsure where to start, usually you would use one of an NFA, DFA, regular expression, or regular grammar. If anyone could help push me in the right direction it would be greatly appreciated.
Here are some hints to get you started:
Notice that Ln = { a2nr + 1 | r ∈ N }, which you can rewrite as { (a2n)ra | r ∈ N }. That might expose a bit more structure that you previously noticed.
If you want to go down the DFA route, think about what information you need to keep track of at each point in time. Each state in the DFA should tell you something about the string you've seen so far. It might help to notice that it doesn't really matter how many total characters are in the string, just what the remainder is modulo n.
If you want to go down the regex route, how would you express the idea of "any number of copies of a2n followed by another a?"
I know I need to show that there is no string that I can get using only left most operation that will lead to two different parsing trees. But how can I do it? I know there is not a simple way of doing it, but since this exercise is on the compilers Dragon book, then I am pretty sure there is a way of showing (no need to be a formal proof, just justfy why) it.
The Gramar is:
S-> SS* | SS+ | a
What this grammar represents is another way of simple arithmetic(I do not remember the name if this technique of anyone knows, please tell me ) : normal sum arithmetic has the form a+a, and this just represents another way of summing and multiplying. So aa+ also mean a+a, aaa*+ is a*a+a and so on
The easiest way to prove that a CFG is unambiguous is to construct an unambiguous parser. If the grammar is LR(k) or LL(k) and you know the value of k, then that is straightforward.
This particular grammar is LR(0), so the parser construction is almost trivial; you should be able to do it on a single sheet of paper (which is worth doing before you try to look up the answer.
The intuition is simple: every production ends with a different terminal symbol, and those terminal symbols appear nowhere else in the grammar. So when you read a symbol, you know precisely which production to use to reduce; there is only one which can apply, and there is no left-hand side you can shift into.
If you invert the grammar to produce Polish (or Łukasiewicz) notation, then you get a trivial LL grammar. Again the parsing algorithm is obvious, since every right hand side starts with a unique terminal, so there is only one prediction which can be made:
S → * S S | + S S | a
So that's also unambiguous. But the infix grammar is ambiguous:
S → S * S | S + S | a
The easiest way to provide ambiguity is to find a sentence which has two parses; one such sentence in this case is:
a + a + a
I think the example string aa actually shows what you need. Can it not be parsed as:
S => SS* => aa OR S => SS+ => aa
I am stumped by this practice problem (not for marks):
{w is an element of {a,b}* : the number of a's is even and the number of b's is even }
I can't seem to figure this one out.
In this case 0 is considered even.
A few acceptable strings: {}, {aa}, {bb}, {aabb}, {abab}, {bbaa}, {babaabba}, and so on
I've done similar examples where the a's must be a prefix, where the answer would be:
(aa)(bb)
but in this case they can be in any order.
Kleene stars (*), unions (U), intersects (&), and concatenation may be used.
Edit: Also have trouble with this one
{w is an element of {0,1}* : w = 1^r 0 1^s 0 for some r,s >= 1}
This is kind of ugly, but it should work:
ε U ( (aa) U (bb) U ((ab) U (ba) (ab) U (ba)) )*
For the second one:
11*011*0
Generally I would use a+ instead of aa* here.
Edit: Undeleted re: the comments in NullUserException's answer.
1) I personally think this one is easier to conceptualize if you first construct a DFA that can accept the strings. I haven't written it down, but off the top of my head I think you can do this with 4 states and one accept state. From there you can create an equivalent regex by removing states one at a time using an algorithm such as this one. This is possible because DFAs and regexes are provably equivalent.
2) Consider the fact that the Kleene star only applies to the nearest regular expression. Hence, if you have two individual ungrouped atoms (an atom itself is a regex!), it only applies to the second one (as in, ab* would match a single a and then any number - including 0 - b's). You can use this to your advantage in a case where you want something to exist, but you're not sure of how many there are.