How to recursively identify operators and arguments in an expression - regex

I have an expression like below;
abs(sum(Max(abs(ListInput)),Min(V1,V2)))
I want to take out operators which take single argument which is a symbol for a list of arguments , stored somewhere else.
For example in above case;
Max(abs(ListInput))
is what I need.
I am able to do it iteratively by tokenizing and converting the expression in to reverse polish notation.
But things break when operator is ternary.
IF(10 > Max(Abs(ListInput)),5,MAX(ListInput)) // (IF<EXPRESSION>,TRUEVALUE,FALSEVALUE) ; True value and Valse value can also be an expression.
I am trying to figure out to a recursive way to do this , but unable to think through all conditions.

I want to take out operators which take single argument which is a
symbol for a list of arguments … regex
/(abs|sum|max|min)\([^(),]*(\([^(),]*\)[^(),]*)*\)/gi
Online regex tester

Related

Regex for finding the index of Mathematical Operators

I am writing a program that takes a string of a mathematical expression, converts it to postfix notation. Which I have done, I am now trying to figure out how to evaluate this expression. I have left it in the data type of a queue, so my idea is to try and find the index of the first operator, then find the two numbers the come before it, send those to a new function which will evaluate them based on which operator is found (so one function for adding, one for subtracting ... etc). Im having trouble figuring out how to grab the index though.
Im trying to use the Queue method of indexOf and then passing it a regex for those operators. using \\W
y is a Queue. i've never used this type of character code before.
var z = y.indexOf("[\\W]")
i would like it to return the index of the first operator, in the case i currently have it is a "+"
currently that doesent find anything. i've also tried dropping those brackets an example Queue is
Queue(-1, 2, 3, *, +, 10, +)
Queue(1, 2, +)
which does mean i need a way to differ if its just a - or if its tied to the number. These are all Strings inside of the Queue
You have to
y.indexWhere(_.matches("\\W"))

Regular expression in C++ for mathematical expressions

I have this trouble: I must verify the correctness of many mathematical expressions especially check for consecutive operators + - * /.
For example:
6+(69-9)+3
is ok while
6++8-(52--*3)
no.
I am not using the library <regex> since it is only compatible with C++11.
Is there a alternative method to solve this problem? Thanks.
You can use a regular expression to verify everything about a mathematical expression except the check that parentheses are balanced. That is, the regular expression will only ensure that open and close parentheses appear at the point in the expression they should appear, but not their correct relationship with other parentheses.
So you could check both that the expression matches a regex and that the parentheses are balanced. Checking for balanced parentheses is really simple if there is only one type of parenthesis:
bool check_balanced(const char* expr, char open, char close) {
int parens = 0;
for (const char* p = expr; *p; ++p) {
if (*p == open) ++parens;
else if (*p == close && parens-- == 0) return false;
}
return parens == 0;
}
To get the regular expression, note that mathematical expressions without function calls can be summarized as:
BEFORE* VALUE AFTER* (BETWEEN BEFORE* VALUE AFTER*)*
where:
BEFORE is sub-regex which matches an open parenthesis or a prefix unary operator (if you have prefix unary operators; the question is not clear).
AFTER is a sub-regex which matches a close parenthesis or, in the case that you have them, a postfix unary operator.
BETWEEN is a sub-regex which matches a binary operator.
VALUE is a sub-regex which matches a value.
For example, for ordinary four-operator arithmetic on integers you would have:
BEFORE: [-+(]
AFTER: [)]
BETWEEN: [-+*/]
VALUE: [[:digit:]]+
and putting all that together you might end up with the regex:
^[-+(]*[[:digit:]]+[)]*([-+*/][-+(]*[[:digit:]]+[)]*)*$
If you have a Posix C library, you will have the <regex.h> header, which gives you regcomp and regexec. There's sample code at the bottom of the referenced page in the Posix standard, so I won't bother repeating it here. Make sure you supply REG_EXTENDED in the last argument to regcomp; REG_EXTENDED|REG_NOSUB, as in the example code, is probably even better since you don't need captures and not asking for them will speed things up.
You can loop over each charin your expression.
If you encounter a + you can check whether it is follow by another +, /, *...
Additionally you can group operators together to prevent code duplication.
int i = 0
while(!EOF) {
switch(expression[i]) {
case '+':
case '*': //Do your syntax checks here
}
i++;
}
Well, in general case, you can't solve this with regex. Arithmethic expressions "language" can't be described with regular grammar. It's context-free grammar. So if what you want is to check correctness of an arbitrary mathemathical expression then you'll have to write a parser.
However, if you only need to make sure that your string doesn't have consecutive +-*/ operators then regex is enough. You can write something like this [-+*/]{2,}. It will match substrings with 2 or more consecutive symbols from +-*/ set.
Or something like this ([-+*/]\s*){2,} if you also want to handle situations with spaces like 5+ - * 123
Well, you will have to define some rules if possible. It's not possible to completely parse mathamatical language with Regex, but given some lenience it may work.
The problem is that often the way we write math can be interpreted as an error, but it's really not. For instance:
5--3 can be 5-(-3)
So in this case, you have two choices:
Ensure that the input is parenthesized well enough that no two operators meet
If you find something like --, treat it as a special case and investigate it further
If the formulas are in fact in your favor (have well defined parenthesis), then you can just check for repeats. For instance:
--
+-
+*
-+
etc.
If you have a match, it means you have a poorly formatted equation and you can throw it out (or whatever you want to do).
You can check for this, using the following regex. You can add more constraints to the [..][..]. I'm giving you the basics here:
[+\-\*\\/][+\-\*\\/]
which will work for the following examples (and more):
6++8-(52--*3)
6+\8-(52--*3)
6+/8-(52--*3)
An alternative, probably a better one, is just write a parser. it will step by step process the equation to check it's validity. A parser will, if well written, 100% accurate. A Regex approach leaves you to a lot of constraints.
There is no real way to do this with a regex because mathematical expressions inherently aren't regular. Heck, even balancing parens isn't regular. Typically this will be done with a parser.
A basic approach to writing a recursive-descent parser (IMO the most basic parser to write) is:
Write a grammar for a mathematical expression. (These can be found online)
Tokenize the input into lexemes. (This will be done with a regex, typically).
Match the expressions based on the next lexeme you see.
Recurse based on your grammar
A quick Google search can provide many example recursive-descent parsers written in C++.

Is there a way to negate a regular expression?

Given a regular expression R that describes a regular language (no fancy backreferences). Is there an algorithmic way to construct a regular expression R* that describes the language of all words except those described by R? It should be possible as Wikipedia says:
The regular languages are closed under the various operations, that is, if the languages K and L are regular, so is the result of the following operations: […] the complement ¬L
For example, given the alphabet {a,b,c}, the inverse of the language (abc*)+ is (a|(ac|b|c).*)?
As DPenner has already pointed out in the comments, the inverse of a regular expresion can be exponentially larger than the original expression. This makes inversing regular expressions unsuitable to implement negative partial expression syntax for searching purposes. Is there an algorithm that preserves the O(n*m) runtime characteristic (where n is the size of the regex and m is the length of the input) of regular expression matching and allows for negated subexpressions?
Unfortunately, the answer given by nhahdtdh in the comments is as good as we can do (so far). Whether a given regular expression generates all strings is PSPACE-complete. Since all problems in NP are in PSPACE-complete, an efficient solution to the universality problem would imply that P=NP.
If there were an efficient solution to your problem, would you be able to resolve the universality problem? Sure you would.
Use your efficient algorithm to generate a regular expression for the negation;
Determine whether the resulting regular expression generates the empty set.
Note that the problem "given a regular expression, does it generate the empty set" is fairly straightforward:
The regular expression {} generates the empty set.
(r + s) generates the empty set iff both r and s generate the empty set.
(rs) generates the empty set iff either r or s generates the empty set.
Nothing else generates the empty set.
Basically, it's pretty easy to tell whether a regular expression generates the empty set: just start evaluating the regular expression.
(Note that while the above procedure is efficient in terms of the output length, it might not be efficient in terms of the input length, if the output length is more than polynomially faster than the input length. However, if that were the case, we'd have the same result anyway, i.e., that your algorithm isn't really efficient, since it would take exponentially many steps to generate an exponentially longer output from a given input).
Wikipedia says: ... if there exists at least one regex that matches a particular set then there exist an infinite number of such expressions. We can deduct from this statement that there is an infinite number of expressions that describe the language of all words except those described by R.
Again, (as also #nhahtdh tried to explain) the simplest algorithm to address this question is to extend the scope of evaluation outside the context of the regular expression language itself. That is: match the strings you want to exclude (which represent a finite subset to work with) by using the original regular expression and then treat any failure to match as an actual match (out of an infinite set of other possibilities). So, if the result of the match is negative, your candidate strings are a subset of the valid solutions.

What's the formal name for this Syntax?

Sometimes in Scheme, I have functions that take arguments like this
add 3 4
What do you call this kind of "list" where it's elements are like a1 a2 a3 ? I don't think you can call it a list because lists are contained in parenthesis and elements are comma-seperated.
The (add 3 4) statement is "function application" from the lambda calculus. The 3 4 from the expression are bindings for the parameters; alternatively, it is the parameter list for the function.
s-expression?
Lisp uses prefix or Polish notation syntax.
Polish notation, also known as prefix
notation, is a form of notation for
logic, arithmetic, and algebra. Its
distinguishing feature is that it
places operators to the left of their
operands. If the arity of the
operators is fixed, the result is a
syntax lacking parentheses or other
brackets, that can still be parsed
without ambiguity.
add is the operator and the right part are the operands.
The arity of the operators isn't fixed so Lisp uses parens in it's syntax to group the expressions.

C++ infix to prefix conversion for logical conditions

I want to evaluate one expression in C++. To evaluate it, I want the expression to be converted to prefix format.
Here is an example
wstring expression = "Feature1 And Feature2";
Here are possible ways.
expression = "Feature1 And (Feature2 Or Feature3)";
expression = "Not Feature1 Or Feature3";
Here And, Or, Not are reserved words and parentheses ("(", )) are used for scope
Not has higher precedence
And is set next precedence to Not
Or is set to next precedence to And
WHITE SPACE used for delimiter. Expression has no other elements like TAB, NEWLINE
I don't need arithmetic expressions. I can do the evaluation but can somebody help me to convert the strings to prefix notation?
You will need to construct the grammar up front. So why do all the parsing by hand.
Instead use a parser builder library like Boost-Spirit. Or lex/yacc or flex/bison.
Then use the AST generated by the parser builder to output the data in any way you see fit. Such as infix to prefix or postfix, ...etc.
I guess your intention is to evaluate condition. hence you dont need a full fledged parser.
First of all you dont need to work with strings here.
1. Convert "Feature 1" to say an Id (An integer which represents a feature)
So, the statement "Feature1 And (Feature2 Or Feature3)"; to say (1 & (2 | 3)
From here on...you can use the standard Infix to prefix conversion and evaluate th prefix notation.
Here is the algorithm to convert infix to prefix
http://www.c4swimmers.esmartguy.com/in2pre.htm
http://www.programmersheaven.com/2/Art_Expressions_p1
Use a parser generator like the Lex/Yacc pair.