C++ infix to prefix conversion for logical conditions - c++

I want to evaluate one expression in C++. To evaluate it, I want the expression to be converted to prefix format.
Here is an example
wstring expression = "Feature1 And Feature2";
Here are possible ways.
expression = "Feature1 And (Feature2 Or Feature3)";
expression = "Not Feature1 Or Feature3";
Here And, Or, Not are reserved words and parentheses ("(", )) are used for scope
Not has higher precedence
And is set next precedence to Not
Or is set to next precedence to And
WHITE SPACE used for delimiter. Expression has no other elements like TAB, NEWLINE
I don't need arithmetic expressions. I can do the evaluation but can somebody help me to convert the strings to prefix notation?

You will need to construct the grammar up front. So why do all the parsing by hand.
Instead use a parser builder library like Boost-Spirit. Or lex/yacc or flex/bison.
Then use the AST generated by the parser builder to output the data in any way you see fit. Such as infix to prefix or postfix, ...etc.

I guess your intention is to evaluate condition. hence you dont need a full fledged parser.
First of all you dont need to work with strings here.
1. Convert "Feature 1" to say an Id (An integer which represents a feature)
So, the statement "Feature1 And (Feature2 Or Feature3)"; to say (1 & (2 | 3)
From here on...you can use the standard Infix to prefix conversion and evaluate th prefix notation.
Here is the algorithm to convert infix to prefix
http://www.c4swimmers.esmartguy.com/in2pre.htm
http://www.programmersheaven.com/2/Art_Expressions_p1

Use a parser generator like the Lex/Yacc pair.

Related

Regular expression in C++ for mathematical expressions

I have this trouble: I must verify the correctness of many mathematical expressions especially check for consecutive operators + - * /.
For example:
6+(69-9)+3
is ok while
6++8-(52--*3)
no.
I am not using the library <regex> since it is only compatible with C++11.
Is there a alternative method to solve this problem? Thanks.
You can use a regular expression to verify everything about a mathematical expression except the check that parentheses are balanced. That is, the regular expression will only ensure that open and close parentheses appear at the point in the expression they should appear, but not their correct relationship with other parentheses.
So you could check both that the expression matches a regex and that the parentheses are balanced. Checking for balanced parentheses is really simple if there is only one type of parenthesis:
bool check_balanced(const char* expr, char open, char close) {
int parens = 0;
for (const char* p = expr; *p; ++p) {
if (*p == open) ++parens;
else if (*p == close && parens-- == 0) return false;
}
return parens == 0;
}
To get the regular expression, note that mathematical expressions without function calls can be summarized as:
BEFORE* VALUE AFTER* (BETWEEN BEFORE* VALUE AFTER*)*
where:
BEFORE is sub-regex which matches an open parenthesis or a prefix unary operator (if you have prefix unary operators; the question is not clear).
AFTER is a sub-regex which matches a close parenthesis or, in the case that you have them, a postfix unary operator.
BETWEEN is a sub-regex which matches a binary operator.
VALUE is a sub-regex which matches a value.
For example, for ordinary four-operator arithmetic on integers you would have:
BEFORE: [-+(]
AFTER: [)]
BETWEEN: [-+*/]
VALUE: [[:digit:]]+
and putting all that together you might end up with the regex:
^[-+(]*[[:digit:]]+[)]*([-+*/][-+(]*[[:digit:]]+[)]*)*$
If you have a Posix C library, you will have the <regex.h> header, which gives you regcomp and regexec. There's sample code at the bottom of the referenced page in the Posix standard, so I won't bother repeating it here. Make sure you supply REG_EXTENDED in the last argument to regcomp; REG_EXTENDED|REG_NOSUB, as in the example code, is probably even better since you don't need captures and not asking for them will speed things up.
You can loop over each charin your expression.
If you encounter a + you can check whether it is follow by another +, /, *...
Additionally you can group operators together to prevent code duplication.
int i = 0
while(!EOF) {
switch(expression[i]) {
case '+':
case '*': //Do your syntax checks here
}
i++;
}
Well, in general case, you can't solve this with regex. Arithmethic expressions "language" can't be described with regular grammar. It's context-free grammar. So if what you want is to check correctness of an arbitrary mathemathical expression then you'll have to write a parser.
However, if you only need to make sure that your string doesn't have consecutive +-*/ operators then regex is enough. You can write something like this [-+*/]{2,}. It will match substrings with 2 or more consecutive symbols from +-*/ set.
Or something like this ([-+*/]\s*){2,} if you also want to handle situations with spaces like 5+ - * 123
Well, you will have to define some rules if possible. It's not possible to completely parse mathamatical language with Regex, but given some lenience it may work.
The problem is that often the way we write math can be interpreted as an error, but it's really not. For instance:
5--3 can be 5-(-3)
So in this case, you have two choices:
Ensure that the input is parenthesized well enough that no two operators meet
If you find something like --, treat it as a special case and investigate it further
If the formulas are in fact in your favor (have well defined parenthesis), then you can just check for repeats. For instance:
--
+-
+*
-+
etc.
If you have a match, it means you have a poorly formatted equation and you can throw it out (or whatever you want to do).
You can check for this, using the following regex. You can add more constraints to the [..][..]. I'm giving you the basics here:
[+\-\*\\/][+\-\*\\/]
which will work for the following examples (and more):
6++8-(52--*3)
6+\8-(52--*3)
6+/8-(52--*3)
An alternative, probably a better one, is just write a parser. it will step by step process the equation to check it's validity. A parser will, if well written, 100% accurate. A Regex approach leaves you to a lot of constraints.
There is no real way to do this with a regex because mathematical expressions inherently aren't regular. Heck, even balancing parens isn't regular. Typically this will be done with a parser.
A basic approach to writing a recursive-descent parser (IMO the most basic parser to write) is:
Write a grammar for a mathematical expression. (These can be found online)
Tokenize the input into lexemes. (This will be done with a regex, typically).
Match the expressions based on the next lexeme you see.
Recurse based on your grammar
A quick Google search can provide many example recursive-descent parsers written in C++.

How to recursively identify operators and arguments in an expression

I have an expression like below;
abs(sum(Max(abs(ListInput)),Min(V1,V2)))
I want to take out operators which take single argument which is a symbol for a list of arguments , stored somewhere else.
For example in above case;
Max(abs(ListInput))
is what I need.
I am able to do it iteratively by tokenizing and converting the expression in to reverse polish notation.
But things break when operator is ternary.
IF(10 > Max(Abs(ListInput)),5,MAX(ListInput)) // (IF<EXPRESSION>,TRUEVALUE,FALSEVALUE) ; True value and Valse value can also be an expression.
I am trying to figure out to a recursive way to do this , but unable to think through all conditions.
I want to take out operators which take single argument which is a
symbol for a list of arguments … regex
/(abs|sum|max|min)\([^(),]*(\([^(),]*\)[^(),]*)*\)/gi
Online regex tester

Why literals are considered expressions in C++?

I'm studying the C++ programming language using Programming priciples and practice using C++.
I'm in chapter 4 now and in this chapter the book introduces the concept of expression, but I can't understand it at all :
The most basic building block in a program is an expression. An espression compute a value from a number of operands. The simplest expression in C++ is simply a literal value such as 11, 'c', "hello". Names of variables are also expressions. A variable represent the object which it is the name.
Why a literal is considered an expression ? Why the name of a variable is considered an expression ?
Expressions -in programming languages, in math, in linguistics- are defined compositionally (or inductively). So expressions are often made of subexpressions like x*2+y*4 is made of two sub-expressions x*2 and y*4 joined by the addition operator +.
But you need a base case (the most atomic and simple expressions). These are literals (2) and variables (x) - if either of them was not an expression 2*x could not be an expression (since both operands of the binary multiplication * are sub-expressions).
Notice that in C and C++ assignments and function calls are expressions
Think of it like this: An expression is a sequence of steps that produce a value. Thus, 4+3 is a two-step expression, because you (1) start with the number 4, and (2) add 3 to it.
Therefore, 7 can be regarded as a single-step sequence, because there is only one "action" performed: (1) start with the number 7.
Thus, both a = 4+3; and a = 7; can be generalised to a = <expression>;.
An expression is "a sequence of operators and operands that specifies a computation" (http://en.cppreference.com/w/cpp/language/expressions).
Let see a simple expression: 3 + 3. When you evaluate this expression, you will get the result 6.
So let see another expression: 3. When you evaluate this expression, you will get the result 3.
A literal is considered an expression because a literal is a type of constant and constants are expressions with a fixed value.
A variable is also considered as an expression because it can be used as an operand within another expression or as an expression by itself.
In software design, composite pattern can be used as a representation of the expression.

Is there a regular language to represent regular expressions?

Specifically, I noticed that the language of regular expressions itself isn't regular. So, I can't use a regular expression to parse a given regular expression. I need to use a parser since the language of the regular expression itself is context free.
Is there any way regular expressions can be represented in a way that the resulting string can be parsed using a regular expression?
Note: My question isn't about whether there is a regexp to match the current syntax of regexes, but whether there exists a "representation" for regular expressions as we know it today (maybe not a neat as what we know them as today) that can be parsed using regular expressions. Also, please could someone remove the dup since it isn't a dup. I'm asking something completely different. I already know that the current language of regular expressions isn't regular (it is how I started my original question).
Depending on what you mean by "represent", the answer is "yes" or "no":
If you want a language that (homomorphically) maps 1:1 to the usual basic regular expression language, the answer is no, because a regular language cannot be isomorphic to a non-regular language, and the standard regular expression language is non-regular. This is because the syntax requires matching opening and closing parentheses of arbitrary depth.
If "represent" only means another method of specifying regular languages, the answer is yes, and right now I can think of at least three ways to achieve this:
The "dumbest" and easiest way is to define some surjective mapping f : ℕ -> RegEx from the natural numbers onto the set of all valid standard regular expressions. You can define the natural numbers using the regular expression 0|1[01]*, and the regular language denoted by a (string representing the) natural number n is the regular language denoted by f(n).
Of course, the meaning attached to a natural number would not be obvious to a human reader at all, so this "regular expression language" would be utterly useless.
As parentheses are the only non-regular part in simple regular expressions, the easiest human-interpretable method would be to extend the standard simple regular expression syntax to allow dangling parentheses and defining semantics for dangling parentheses.
The obvious choice would be to ignore non-matching opening parentheses and interpreting non-matching closing parentheses as matching the beginning of the regex. This essentially amounts to implicitly inserting as many opening parentheses at the beginning and as many closing parentheses at the end of the regex as necessary. Additionally, (* would have to be interpreted as repetition of the empty string. If I didn't miss anything, this definition should turn any string into a "regular expression" with a specified meaning, so .* defines this "regular expression language".
This variant even has the same abstract syntax as standard regular expressions.
Another variant would be to specify the NFA that recognizes the language directly using a regular language, e.g.: ([a-z]+,([^,]|\\,|\\\\)+,[a-z]+\$?;)*.
The idea is that [a-z]+ is used as a label for states, and the expression is a list of transition triples (s, c, t) from source state s to target state t consuming character c, and a $ indicating accepting transitions (cf. note below). In c, backslashes are used to escape commas or backslashes - I assumed that you use the same alphabet for standard regular expressions, but of course you can replace the middle component with any other regular language of symbols denotating characters of any alphabet you wish.
The first source state mentioned is the (single) initial state. An empty expression defines the empty language.
Above, I wrote "accepting transition", not "accepting state" because that would make the regex above a bit more complex. You can interpret a triple containing a $ as two transitions, namely one transition consuming c from s to a new, unique state, and an ε-transition from that state to t. This should allow any NFA to be represented, by replacing each transition to an accepting state with a $ triple and each transition to a non-accepting state with a non-$ triple.
One note that might make the "yes" part look more intuitive: Assembly languages are regular, and those are even Turing-complete, so it would be unexpected if it wasn't possible to specify "mere" regular languages using a regular language.
The answer is probably NO.
As you have pointed out, set of all possible regular expressions itself is not a regular set. Any TRUE regular expression (not those extended) can be converted into finite automata (FA). If regular expression can be represented in a form that can be parsed by itself, then FA can be parsed by regular expression as well.
But that's not possible as far as I know. RE itself can be reduced into three basic operation(According to the Dragon Book):
concatenation: e.g. ab
alternation: e.g. a|b
kleen closure: e.g. a*
The kleen closure can match infinite number of characters, but it cannot know how many characters to match.
Just think such case: you want to match 3 consecutive as. Then the corresponding regular expression is /aaa/. But what if you want match 4, 5, 6... as? Parser with only one RE cannot know the exact number of as. So it fails to give the right matching to arbitrary expressions. However, the RE parser has to match infinite different forms of REs. According to your expression, a regular expression cannot match all the possibilities.
Well, the only difference of a RE parser is that it does not need a tokenizer.(probably that's why RE is used in lexical analysis) Every character in RE is a token (excluding those escape charcters). But to parse RE, whatever it is converted,one has to face up with NFA/DFA/TREE... all equivalent structures that cannot be parsed by RE itself.

What would be the Regular Expression to retrieve number in front of % symbol in string?

I'm using a math parser that uses % as the mod symbol. I'd like to use it for the percent symbol instead to allow users to type "25%*10" or "25% of 10" and receive the answer "2.5" (they could type anything).
I'd then use the regex asked for to get the "25%" (could be in any part of the string) and do a simple calculation (25 / 100) and then replace the 25% in the string with the calculated value to pass to the math parser. Basically I'm doing some pre calculations on the number in front of the percent symbol before passing the full string to the math parser.
I've always struggled with regex and was hoping someone could help me.
Thanks
Easiest way would be to let the math parser do the dividing:
s/([0-9.]+)%/($1\/100)/g
Which just replaces "25%" with "(25/100)" ... so if the user typed "25%*10" they'll now have "(25/100)*10" which when evaluated gives them the right answer.
Alternatively, in Perl, you could do:
s{([0-9.]+)%}{$1/100}eg
which would have Perl calculate the division by a hundred and pass that to the math parser
The regular expression to find a number before a '%' sign is /(\d+)%/.
However, mathematical expressions are not a regular language, so a regular expression is likely not the right tool. You should instead tell your parser to interpret '%' as "take whatever comes before this and divide it by 100". "Whatever comes before this" can then be any mathematical expression, and you won't run into operator precedence problems, resp. you could sort these out in the syntax tree parser.
Mixing text substitution with real language features often violates the principle of least astonishment. When a user sees that '%' divides whatever comes before by 100, he might try to use expressions like (23+42)%, but that will just produce a syntax error. Also, you need a more elaborate regex if you have something like 1.34e-14%, but these things would just sort themselves out when you use the tree parser.