Arithmetic expression simplifier in Clojure - clojure

I want of create simplifier of arithmetic expressions in Clojure, and I am new to this language.
So for ex.:
in: "2x + 6y - (12 + (5x - 3y)) + 4"
simplified: "- 3x + 9y - 8".
So my attempt is to parse expression with regexp into hierarchical vector of nested expressions like this:
["5x-3y" "12 + <?>" "2x + 6y - <?> + 4"] ;; <?> is evaluated item from
previous step
and then evaluate them in sequence.
I am feeling like it's hack, some advice would be helpful.

Related

Regular expression for equations, variable number of inside parenthesis

I'm trying to write Regex for the case where I have series of equations, for example:
a = 2 / (1 + exp(-2*n)) - 1
a = 2 / (1 + e) - 1
a = 2 / (3*(1 + exp(-2*n))) - 1
In any case I need to capture content of the outer parenthesis, so 1 + exp(-2*n), 1+e and 3*(1 + exp(-2*n)) respectively.
I can write expression that will catch one of them, like:
\(([\w\W]*?\))\) will perfectly catch 1 + exp(-2*n)
\(([\w\W]*?)\) will catch 1+e
\(([\w\W]*?\))\)\) will catch 3*(1 + exp(-2*n))
But it seems silly to pass three lines of code for something such simple. How can I bundle it? Please take a note that I will be processing text (in loop) line-by-line anyway, so you don't have to bother for securing operator to not greedy take next line.
Edit:
Un-nested brackets are also allowed: a = 2 / (1 + exp(-2*n)) - (2-5)
The commented code below does not use regular expressions, but does parse char arrays in MATLAB and output the terms which contain top-level brackets.
So in your 3 question examples with a single set of nested brackets, it returns the outermost bracketed term.
In the example from your comment where there are two or more (possibly nested) terms within brackets at the "top level", it returns both terms.
The logic is as follows, see the comments for more details
Find the left (opening) and right (closing) brackets
Generate the "nest level" according to how many un-closed brackets there are at each point in the equation char
Find the indicies where the nesting level changes. We're interested in opening brackets where the nest level increases to 1 and closing brackets where it decreases from 1.
Extract the terms from these indices
e = { 'a = 2 / (1 + exp(-2*n)) - 1'
'a = 2 / (1 + e) - 1'
'a = 2 / (3*(1 + exp(-2*n))) - 1'
'a = 2 / (1 + exp(-2*n)) - (2-5)' };
str = cell(size(e)); % preallocate output
for ii = 1:numel(e)
str{ii} = parseBrackets_(e{ii});
end
function str = parseBrackets_( equation )
bracketL = ( equation == '(' ); % indicies of opening brackets
bracketR = ( equation == ')' ); % indicies of closing brackets
str = {}; % intialise empty output
if numel(bracketL) ~= numel(bracketR)
% Validate the input
warning( 'Could not match bracket pairs, count mismatch!' )
return
end
nL = cumsum( bracketL ); % cumulative open bracket count
nR = cumsum( bracketR ); % cumulative close bracket count
nestLevel = nL - nR; % nest level is number of open brackets not closed
nestLevelChanged = diff(nestLevel); % Get the change points in nest level
% get the points where the nest level changed to/from 1
level1L = find( nestLevel == 1 & [true,nestLevelChanged==1] ) + 1;
level1R = find( nestLevel == 1 & [nestLevelChanged==-1,true] );
% Compile cell array of terms within nest level 1 brackets
str = arrayfun( #(x) equation(level1L(x):level1R(x)), 1:numel(level1L), 'uni', 0 );
end
Outputs:
str =
{'1 + exp(-2*n)'}
{'1 + e'}
{'3*(1 + exp(-2*n))'}
{'1 + exp(-2*n)'} {'2-5'}

How to search and replace while ignoring placeholders? [duplicate]

I am converting a code of mine from MATLAB to julia, thus I need to replace parentheses used for indexing: they are of the type () in MATLAB and of the type [] in julia. Functions parentheses are of the same type in both, i.e. ().
I thought that the fastest way to do this was to use Notepad++, finding all of the parenthes and then replacing them with brackets when need.
Anyhow it does not work as expected.
I won't copy all of the function I am converting now, but some parts as example:
x= coord(:,1);
y= coord(:,2);
natG_coord(1,1)= sqrt(1/3);
natG_coord(2,1)= -sqrt(1/3);
natG_coord(3,1)= -sqrt(1/3);
natG_coord(4,1)= sqrt(1/3);
for i=1:4
dNG(1,i)= (1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
dNG(2,i)= -(1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
dNG(3,i)= -(1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
dNG(4,i)= (1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
end
I tried finding \((.*)\) and replacing with [$1], but it does not get all of the parentheses. For instance, it gets the ones in declaring x and y, the sqrt value but does not get the natG_coord indexes. In the for cycle, it only gets the last expression of each line, i.e. (1-etaG(i)^2), but the external parenthes, not the etaG index (which is actually what I need to replace).
I cannot see a pattern in the choice and thus cannot come up with a solution.
Other solutions not to get mad doing this parenthesis by parenthesis is fine!
Thank you all for your help.
edit
#stribizhev: the final result should be this:
x= coord[:,1]
y= coord[:,2]
natG_coord[1,1]= sqrt(1/3)
natG_coord[2,1]= -sqrt(1/3)
natG_coord[3,1]= -sqrt(1/3)
natG_coord[4,1]= sqrt(1/3)
for i=1:4
dNG[1,i]= (1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
dNG[2,i]= -(1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
dNG[3,i]= -(1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
dNG[4,i]= (1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
end
What I get finding \((.*)\) and replacing with [$1] one time is:
x= coord[:,1];
y= coord[:,2];
natG_coord[1,1)= sqrt(1/3];
natG_coord[2,1)= -sqrt(1/3];
natG_coord[3,1)= -sqrt(1/3];
natG_coord[4,1)= sqrt(1/3];
for i=1:4
dNG[1,i)= (1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
dNG[2,i)= -(1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
dNG[3,i)= -(1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
dNG[4,i)= (1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
end
What I get finding \(((?>[^()]|(?R))*)\) and replacing all with [$1] one time is (I know you said several times, if I do it it'll replace every matching braces in the end):
x= coord[:,1];
y= coord[:,2];
natG_coord[1,1]= sqrt[1/3];
natG_coord[2,1]= -sqrt[1/3];
natG_coord[3,1]= -sqrt[1/3];
natG_coord[4,1]= sqrt[1/3];
for i=1:4
dNG[1,i]= [1+etaG(i)]/4 + csiG[i]*[1+etaG(i)]/2 - [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
dNG[2,i]= -[1+etaG(i)]/4 + csiG[i]*[1+etaG(i)]/2 + [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
dNG[3,i]= -[1-etaG(i)]/4 + csiG[i]*[1-etaG(i)]/2 + [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
dNG[4,i]= [1-etaG(i)]/4 + csiG[i]*[1-etaG(i)]/2 - [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
end
What I get finding \(([^()]*)\) replacing all with [$1] one time is:
x= coord[:,1];
y= coord[:,2];
natG_coord[1,1]= sqrt[1/3];
natG_coord[2,1]= -sqrt[1/3];
natG_coord[3,1]= -sqrt[1/3];
natG_coord[4,1]= sqrt[1/3];
for i=1:4
dNG[1,i]= (1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
dNG[2,i]= -(1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
dNG[3,i]= -(1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
dNG[4,i]= (1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
end
So the last one is exactly what I was looking for. Once I go with the "find next" command, I can decide whether they are indexing parantheses or not and substitute them or not (avoiding the sqrt function input, for instance).
Thank you very much for your help.
Since the \(([^()]*)\) (to replace with [$1]) worked for you, here is the explanation:
\(([^()]*)\)
Matches:
\( - an opening round bracket
([^()]*) - Capture group 1 matches zero or more characters other than ( and ) (with [^()]*)
\)- a closing round bracket
This regex above will match all last nested level parentheses, that do not have any parentheses inside them.
Answering Aaron's remark about replacing the parentheses inside the quoted strings, it is great that Notepad++ supports Boost conditional replacement patterns. We can match what we do not need to modify and replace with self, and use another replacement for the other matches.
(?<o1>"[^"\\]*(?:\\.[^"\\]*)*")|(?<o2>\(([^()]*)\))
And replace with (?{o1}$+{o1}:[$3]).
Note that "[^"\\]*(?:\\.[^"\\]*)*" matches C strings with escaped entities correctly and efficiently. The replacement pattern means to replace with the quoted string (if o1 group matched) or with [+Group 3 value+] (if the other group matched).
If you need to replace outer balanced parentheses, use
\(((?>[^()]|(?R))*)\)
And replace with [$1] (see demo). If you need to replace the overlapping parenthetical substrings, you will need to hit Replace All several times.
Regex explanation:
\( # an outer literal opening round bracket
( # start group 1
(?> # start of atomic group
[^()] # any character other than ( and )
| # OR
(?R) # recursively match the whole pattern
)* # end atomic group and repeat zero or more times
) # end of group 1
\) # match a literal closing round bracket
If the strings you need to replace those parentheses should be preceded with word characters, use
(\w+)(\(((?>[^()]|(?2))*)\))
And replace with $1[$3]. See demo
This regex uses a (?2) subroutine that just repeats the second capture group subpattern.
Now, avoiding to match these inside quoted strings. Assume we have var d = "r(string here)" and we do not want to turn the () to [] here. Instead of (\w+)(\(((?>[^()]|(?2))*)\)) (with $1[$3] replacement), use
(?<o1>"[^"\\]*(?:\\.[^"\\]*)*")|(?<o2>(\w+)(\(((?>[^()]|(?4))*)\)))
And (?{o1}$+{o1}:$3[$5]) as the replacement. This will keep var d = "r(string here)" string intact, and will turn var f = a(fg()g) into var f = a[fg()g].

Regular expression properties (theory) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm reading Aho and Ullman's book The Theory of Parsing, Translation, and Compiling. In the section that introduces regular expressions in chapter 2, there is a list of properties of regular expressions. I do not understand properties 2 and 8. Here is the list of properties:
(1) 𝛼 + 𝛽 = 𝛽 + 𝛼
(2) βˆ…* = 𝑒
(3) 𝛼 + (𝛽 + 𝛾) = (𝛼 + 𝛽) + 𝛾
(4) 𝛼(𝛽𝛾) = (𝛼𝛽)𝛾
(5) 𝛼(𝛽 + 𝛾) = 𝛼𝛽 + 𝛼𝛾
(6) (𝛼 + 𝛽)𝛾 = 𝛼𝛾 + 𝛽𝛾
(7) 𝛼𝑒 = 𝑒𝛼 = 𝛼
(8) βˆ…π›Ό = π›Όβˆ… = βˆ…
(9) 𝛼* = 𝛼 + 𝛼*
(10) (𝛼* )* = 𝛼*
(11) 𝛼 + 𝛼 = 𝛼
(12) 𝛼 + βˆ… = 𝛼
where βˆ… is the regular expression denoting the regular set βˆ…, 𝛼, 𝛽, 𝛾 are arbitrary regular expressions, and 𝑒 is the empty string.
How are properties (2) and (8) justified?
Edit: To explain the notation of +, *, etc, here are some definitions given in the book (quoted):
DEFINITION Let 𝚺 be a finite alphabet. We define a regular set over
𝚺 recursively in the following manner:
(1) βˆ… (the empty set) is a regular set over 𝚺.
(2) {𝑒} is a regular set over 𝚺.
(3) {π‘Ž} is a regular set over 𝚺 for all 𝛼 in 𝚺.
(4) If 𝑃 and 𝑄 are regular sets over 𝚺, then so are
(a) 𝑃 βˆͺ 𝑄.
(b) 𝑃𝑄.
(c) 𝑃*.
(5) Nothing else is a regular set.
Thus a subset of 𝚺* is regular if and only if it is βˆ…, {𝑒}, or {π‘Ž},
for some π‘Ž in 𝚺, or can be obtained from these by a finite number of
applications of the operations union, concatenation, and closure.
.
DEFINITION Regular expressions over 𝚺 and the regular expressions
they denote are defined recursively, as follows:
(1) βˆ… is a regular expression denoting the regular set βˆ….
(2) 𝑒 is a regular expression denoting the regular set {𝑒}.
(3) π‘Ž in 𝚺 is a regular expression denoting the regular set {π‘Ž}.
(4) If 𝑝 and π‘ž are regular expressions denoting the regular sets 𝑃
and 𝑄, respectively, then
(a) (𝑝+π‘ž) is a regular expression denoting 𝑃 βˆͺ 𝑄.
(b) (π‘π‘ž) is a regular expression denoting 𝑃𝑄.
(c) (𝑝)* is a regular expression denoting 𝑃*.
(5) Nothing else is a regular expression.
My guess is that 2 & 8 properties might be just a simple math:
Property 2
βˆ… is an empty set, then βˆ…* = 𝑒 is true, βˆ…+ = 𝑒 is also true, βˆ…{Infinity} = 𝑒 is also true, since e is an empty string.
A regular expression is a string, thus an empty regular expression repeating any number of times or with any operation, is still an empty regular expression, which again equals to an empty string in the right side.
Reference:
Why is the Kleene star of a null set is an empty string?
Property 8
βˆ…π›Ό = π›Όβˆ… = βˆ… is true, and so is βˆ…π›Όβˆ…π›Όβˆ…π›Ό = π›Όβˆ…π›Όβˆ…π›Όβˆ… = βˆ…, because an empty set combined with anything would result an empty set.
Reference:
Regular expressions with empty set/empty string
What is the difference between language of empty string and empty set language?
How can concatenating empty sets (languages) result in a set containing empty string?

How to split expression containing brackets correctly

I am trying to write an expression handler that will correctly split brackets, until today it has worked very well, but I've now encountered a problem I hadn't thought of.
I try to split the expression by the content of brackets first, once these are evaluated I replace the original content with the results and process until there are no brackets remaining.
The expression may contain marcos/variables. Macros are denoted by text wrapped in $macro$.
A typical expression:
($exampleA$ * 3) + ($exampleB$ / 2)
Macros are replaced before the expression is evaluated, the above works fine because the process is as follows:
Split expression by brackets, this results in two expressions:
$exampleA$ * 3
$exampleB$ / 2
Each expression is then evaluated, if exampleA = 3 and exampleB = 6:
$exampleA$ * 3 = 3 * 3 = 9
$exampleB$ / 2 = 6 / 2 = 3
The expression is then rebuilt using the results:
9 + 3
The final expression without any brackets is then evaluated to:
12
This works fine until an expressions with nested brackets is used:
((($exampleA$ * 3) + ($exampleB$ / 2) * 2) - 1)
This breaks completely because the regular expression I'm using:
regex("(?<=\\()[^)]*(?=\\))");
Results in:
($exampleA$ * 3
$exampleB$ / 2
So how can I correctly decode this, I want the above to be broken down to:
$exampleA$ * 3
$exampleB$ / 2
I am not exactly sure what you are trying to do. If you want to match the innermost expressions, wouldn't this help?:
regex("(?<=\\()[^()]*(?=\\))");
By the way, are the parentheses in your example unbalanced on purpose?
Traditional regex cannot handle recursive structures like nested brackets.
Depending on which regex flavor you are using, you may be able to use regex recursion. Otherwise, you will probably need a new method for parsing the groups. I think the traditional way is to represent the expression as a stack: start with an empty stack, push when you find a '(', pop when you find a ')'.
You can't really do this with regex. You really need a recursive method, like this:
using System;
using System.Data;
using System.Xml;
public class Program
{
public static void Main() {
Console.WriteLine(EvaluateExpression("(1 + 2) * 7"));
}
public static int EvaluateExpression(string expression) {
// Recursively evaluate parentheses as sub expressions
var expr = expression.ToLower();
while (expr.Contains("(")) {
// Find first opening bracket
var count = 1;
var pStart = expr.IndexOf("(", StringComparison.InvariantCultureIgnoreCase);
var pos = pStart + 1;
// Find matching closing bracket
while (pos < expr.Length && count > 0) {
if (expr.Substring(pos, 1) == "(") count++;
if (expr.Substring(pos, 1) == ")") count--;
pos++;
}
// Error if no matching closing bracket
if (count > 0) throw new InvalidOperationException("Closing parentheses not found.");
// Divide expression into sub expression
var pre = expr.Substring(0, pStart);
var subexpr = expr.Substring(pStart + 1, pos - pStart - 2);
var post = expr.Substring(pos, expr.Length - pos);
// Recursively evaluate the sub expression
expr = string.Format("{0} {1} {2}", pre, EvaluateExpression(subexpr), post);
}
// Replace this line with you're own logic to evaluate 'expr', a sub expression with any brackets removed.
return (int)new DataTable().Compute(expr, null);
}
}
I'm assuming your using C# here... but you should get the idea and be able to translate it into whatever.
If you use the following regex, you can capture them as group(1). group(0) will have parenthesis included.
"\\(((?:\"\\(|\\)\"|[^()])+)\\)"
Hope it helps!

Matching math expression with regular expression?

For example, these are valid math expressions:
a * b + c
-a * (b / 1.50)
(apple + (-0.5)) * (boy - 1)
And these are invalid math expressions:
--a *+ b # 1.5.0 // two consecutive signs, two consecutive operators, invalid operator, invalid number
-a * b + 1) // unmatched parentheses
a) * (b + c) / (d // unmatched parentheses
I have no problem with matching float numbers, but have difficulty with parentheses matching. Any idea? If there is better solution than regular expression, I'll accept as well. But regex is preferred.
========
Edit:
I want to make some comments on my choice of the β€œaccepted answer”, hoping that people who have the same question and find this thread will not be misled.
There are several answers I consider β€œaccepted”, but I have no idea which one is the best. So I chose the accepted answer (almost) randomly. I recommend reading Guillaume Malartre’s answer as well besides the accepted answer. All of them give practical solutions to my question. For a somewhat rigorous/theoretical answer, please read David Thornley’s comments under the accepted answer. As he mentioned, Perl’s extension to regular expression (originated from regular language) make it β€œirregular”. (I mentioned no language in my question, so most answerers assumed the Perl implementation of regular expression – probably the most popular implementation. So did I when I posted my question.)
Please correct me if I said something wrong above.
Use a pushdown automaton for matching paranthesis http://en.wikipedia.org/wiki/Pushdown_automaton (or just a stack ;-) )
Details for the stack solution:
while (chr available)
if chr == '(' then
push '('
else
if chr == ')' then
if stack.elements == 0 then
print('too many or misplaced )')
exit
else
pop //from stack
end while
if (stack.elements != 0)
print('too many or misplaced(')
Even simple: just keep a counter instead of stack.
Regular expressions can only be used to recognize regular languages. The language of mathematical expressions is not regular; you'll need to implement an actual parser (e.g. LR) in order to do this.
Matching parens with a regex is quite possible.
Here is a Perl script that will parse arbitrary deep matching parens. While it will throw out the non-matching parens outside, I did not design it specifically to validate parens. It will parse arbitrarily deep parens so long as they are balanced. This will get you started however.
The key is recursion both in the regex and the use of it. Play with it, and I am sure that you can get this to also flag non matching prens. I think if you capture what this regex throws away and count parens (ie test for odd parens in the non-match text), you have invalid, unbalanced parens.
#!/usr/bin/perl
$re = qr /
( # start capture buffer 1
\( # match an opening paren
( # capture buffer 2
(?: # match one of:
(?> # don't backtrack over the inside of this group
[^()]+ # one or more
) # end non backtracking group
| # ... or ...
(?1) # recurse to opening 1 and try it again
)* # 0 or more times.
) # end of buffer 2
\) # match a closing paren
) # end capture buffer one
/x;
sub strip {
my ($str) = #_;
while ($str=~/$re/g) {
$match=$1; $striped=$2;
print "$match\n";
strip($striped) if $striped=~/\(/;
return $striped;
}
}
while(<DATA>) {
print "start pattern: $_";
while (/$re/g) {
strip($1) ;
}
}
__DATA__
"(apple + (-0.5)) * (boy - 1)"
"((((one)two)three)four)x(one(two(three(four))))"
"a) * (b + c) / (d"
"-a * (b / 1.50)"
Output:
start pattern: "(apple + (-0.5)) * (boy - 1)"
(apple + (-0.5))
(-0.5)
(boy - 1)
start pattern: "((((one)two)three)four)x(one(two(three(four))))"
((((one)two)three)four)
(((one)two)three)
((one)two)
(one)
(one(two(three(four))))
(two(three(four)))
(three(four))
(four)
start pattern: "a) * (b + c) / (d"
(b + c)
start pattern: "-a * (b / 1.50)"
(b / 1.50)
I believe you will be better off implementing a real parser to accomplish what you're after.
A parser for simple mathematical expressions is "Parsing 101", and there are several examples to be found online.
Some examples include:
ANTLR: Expression Evaluator Sample (ANTLR grammars can target several languages)
pyparsing: http://pyparsing.wikispaces.com/file/view/fourFn.py (pyparsing is a Python library)
Lex & Yacc: http://epaperpress.com/lexandyacc/ (contains a PDF tutorial and sample code for a calculator)
Note that the grammar you will need for validating expressions is simpler than the examples above, since the examples also implement evaluation of the expression.
You can't use regex to do things like balance parenthesis.
This is tricky with one single regular expression, but quite easy using mixed regexp/procedural approach. The idea is to construct a regexp for the simple expression (without parenthesis) and then repeatedly replace ( simple-expression ) with some atomic string (e.g. identifier). If the final reduced expression matches the same `simple' pattern, the original expression is considered valid.
Illustration (in php).
function check_syntax($str) {
// define the grammar
$number = "\d+(\.\d+)?";
$ident = "[a-z]\w*";
$atom = "[+-]?($number|$ident)";
$op = "[+*/-]";
$sexpr = "$atom($op$atom)*"; // simple expression
// step1. remove whitespace
$str = preg_replace('~\s+~', '', $str);
// step2. repeatedly replace parenthetic expressions with 'x'
$par = "~\($sexpr\)~";
while(preg_match($par, $str))
$str = preg_replace($par, 'x', $str);
// step3. no more parens, the string must be simple expression
return preg_match("~^$sexpr$~", $str);
}
$tests = array(
"a * b + c",
"-a * (b / 1.50)",
"(apple + (-0.5)) * (boy - 1)",
"--a *+ b # 1.5.0",
"-a * b + 1)",
"a) * (b + c) / (d",
);
foreach($tests as $t)
echo $t, "=", check_syntax($t) ? "ok" : "nope", "\n";
The above only validates the syntax, but the same technique can be also used to construct a real parser.
For parenthesis matching, and implementing other expression validation rules, it is probably easiest to write your own little parser. Regular expressions are no good in this kind of situation.
Ok here's my version of parenthesis finding in ActionScript3, using this approach give a lot of traction to analyse the part before the parenthesis, inside the parenthesis and after the parenthis, if some parenthesis remains at the end you can raise a warning or refuse to send to a final eval function.
package {
import flash.display.Sprite;
import mx.utils.StringUtil;
public class Stackoverflow_As3RegexpExample extends Sprite
{
private var tokenChain:String = "2+(3-4*(4/6))-9(82+-21)"
//Constructor
public function Stackoverflow_As3RegexpExample() {
// remove the "\" that just escape the following "\" if you want to test outside of flash compiler.
var getGroup:RegExp = new RegExp("((?:[^\\(\\)]+)?) (?:\\() ( (?:[^\\(\\)]+)? ) (?:\\)) ((?:[^\\(\\)]+)?)", "ix") //removed g flag
while (true) {
tokenChain = replace(tokenChain,getGroup)
if (tokenChain.search(getGroup) == -1) break;
}
trace("cummulativeEvaluable="+cummulativeEvaluable)
}
private var cummulativeEvaluable:Array = new Array()
protected function analyseGrammar(matchedSubstring:String, capturedMatch1:String, capturedMatch2:String, capturedMatch3:String, index:int, str:String):String {
trace("\nanalyseGrammar str:\t\t\t\t'"+str+"'")
trace("analyseGrammar matchedSubstring:'"+matchedSubstring+"'")
trace("analyseGrammar capturedMatchs:\t'"+capturedMatch1+"' '("+capturedMatch2+")' '"+capturedMatch3+"'")
trace("analyseGrammar index:\t\t\t'"+index+"'")
var blank:String = buildBlank(matchedSubstring.length)
cummulativeEvaluable.push(StringUtil.trim(matchedSubstring))
// I could do soo much rigth here!
return str.substr(0,index)+blank+str.substr(index+matchedSubstring.length,str.length-1)
}
private function replace(str:String,regExp:RegExp):String {
var result:Object = regExp.exec(str)
if (result)
return analyseGrammar.apply(null,objectToArray(result))
return str
}
private function objectToArray(value:Object):Array {
var array:Array = new Array()
var i:int = 0
while (true) {
if (value.hasOwnProperty(i.toString())) {
array.push(value[i])
} else {
break;
}
i++
}
array.push(value.index)
array.push(value.input)
return array
}
protected function buildBlank(length:uint):String {
var blank:String = ""
while (blank.length != length)
blank = blank+" "
return blank
}
}
}
It should trace this:
analyseGrammar str: '2+(3-4*(4/6))-9(82+-21)'
analyseGrammar matchedSubstring:'3-4*(4/6)'
analyseGrammar capturedMatchs: '3-4*' '(4/6)' ''
analyseGrammar index: '3'
analyseGrammar str: '2+( )-9(82+-21)'
analyseGrammar matchedSubstring:'2+( )-9'
analyseGrammar capturedMatchs: '2+' '( )' '-9'
analyseGrammar index: '0'
analyseGrammar str: ' (82+-21)'
analyseGrammar matchedSubstring:' (82+-21)'
analyseGrammar capturedMatchs: ' ' '(82+-21)' ''
analyseGrammar index: '0'
cummulativeEvaluable=3-4*(4/6),2+( )-9,(82+-21)