Read inner Parentheses of a String (VB.NET) [duplicate] - regex

This question already has answers here:
Evaluate mathematical expression from a string using VB
(3 answers)
Closed 4 years ago.
I am developping a calculator function in Visual Basic. I think there is no basic one in the default .NET libraries.
I use System.Data.DataTable.Compute() to calculate the normal Math expression. But I want my function to solve functions like Sinus() or Round() as well.
Currently I am using Regex for this. For example for functions with one argument I use
Dim expr As String = Regex.Replace(Term, "(?<func>[A-Za-z]*?)\((?<arg1>[0-9,.]*?)\)", AddressOf FunctionLibrary.Funcs1)
the Sinus() function would look like this: sin(Number).
But this only allows to write Doubles or Integers into the argument. I can not write inner functions or even another math expression between the parentheses.
If I would write more functions as an argument, Regex would detect the first ")" inside the function, which is the closing parenthese of the inner function, as the end of the outer function.
Is there any way to make Regex recognize that theres an inner function as well?
If anyone knows an Evaluate() function for Visual Basic which is in the default .NET libraries, this might help me as well

Is there any way to make Regex recognize that theres an inner function as well?
No, there is not. Regular expressions, as the name implies, solve Regular grammars and simple mathematical expressions are Context-Free. Regular expressions do not have a stack to match arbitrary expressions. For example, distinguishing between (()) and ()() require at least one character of lookahead (or backtracking). Yes, PCRE-style regular expressions can let you create a fixed number of lookahead characters, but so far as I know you have to specify the number of characters, and anyway this is not going to solve your problem.
Evaluating arithmetic expressions require handling precedence, subexpressions, possibly variables and types. Regular expressions cannot do this, attempting to do it with regular expressions will lead you into a pit of failure.
Nor are they even necessary. Evaluating mathematical expressions is a solved problem and there are dozens of parsers and evaluators written, tested, and ready for you to drop into your application. You have not given us enough information to decide which one would be best for you, and anyway Stack Overflow is not a tool advocacy site. You could start by going through the list at Gary Beene's Equation Parser review.

Related

POSIX Regular Expressions: Excluding a word in an expression?

I am trying to create a regular expression using POSIX (Extended) Regular Expressions that I can use in my C program code.
Specifically, I have come up with the following, however, I want to exclude the word "http" within the matched expressions. Upon some searching, it doesn't look like POSIX makes it obvious for catching specific strings. I am using something called a "negative look-a-head" in the below example (i.e. the (?!http:) ). However, I fear that this may only be something available to regular expressions defined in dialects other than POSIX.
Is negative lookahead allowed? Is the logical NOT operator allowed in POSIX (i.e. ! )?
Working regular expression example:
href|HREF|src[[:space:]]=[[:space:]]\"(?!http:)[^\"]+\"[/]
If I cannot use negative-lookahead like in other dialects, what can I do to the above regular expression to filter out the specific word "http:"? Ideally, is there any way without inverse logic and ultimately creating a ridiculously long regular expression in the process? (the one I have above is quite long already, I'd rather it not look more confusing if possible)
[NOTE: I have consulted other related threads in Stack Overflow, but the most relevant ones seem to only ask this question "generically", which means answers given didn't necessarily mean they were POSIX-flavored ==> in another thread or two, I've seen the above (?!insertWordToExcludeHere) negative lookahead, but I fear it's only for PHP.)
[NOTE 2: I will take any POSIX regular expression phrasings as well, any help would be appreciated. Does anyone have a suggestion on how whatever regular expression that would filter out "http:" would look like and how it could be fit into my current regular expression, replacing the (?!http:)?]
According to http://www.regular-expressions.info/refflavors.html lookaheads and lookbehinds are not in the POSIX flavour.
You may consider thinking in terms of lexing (tokenization) and parsing if your problem is too complex to be represented cleanly as a regex.

Regular Expression Vs. String Parsing

At the risk of open a can of worms and getting negative votes I find myself needing to ask,
When should I use Regular Expressions and when is it more appropriate to use String Parsing?
And I'm going to need examples and reasoning as to your stance. I'd like you to address things like readability, maintainability, scaling, and probably most of all performance in your answer.
I found another question Here that only had 1 answer that even bothered giving an example. I need more to understand this.
I'm currently playing around in C++ but Regular Expressions are in almost every Higher Level language and I'd like to know how different languages use/ handle regular expressions also but that's more an after thought.
Thanks for the help in understanding it!
Edit: I'm still looking for more examples and talk on this but the response so far has been great. :)
It depends on how complex the language you're dealing with is.
Splitting
This is great when it works, but only works when there are no escaping conventions.
It does not work for CSV for example because commas inside quoted strings are not proper split points.
foo,bar,baz
can be split, but
foo,"bar,baz"
cannot.
Regular
Regular expressions are great for simple languages that have a "regular grammar". Perl 5 regular expressions are a little more powerful due to back-references but the general rule of thumb is this:
If you need to match brackets ((...), [...]) or other nesting like HTML tags, then regular expressions by themselves are not sufficient.
You can use regular expressions to break a string into a known number of chunks -- for example, pulling out the month/day/year from a date. They are the wrong job for parsing complicated arithmetic expressions though.
Obviously, if you write a regular expression, walk away for a cup of coffee, come back, and can't easily understand what you just wrote, then you should look for a clearer way to express what you're doing. Email addresses are probably at the limit of what one can correctly & readably handle using regular expressions.
Context free
Parser generators and hand-coded pushdown/PEG parsers are great for dealing with more complicated input where you need to handle nesting so you can build a tree or deal with operator precedence or associativity.
Context free parsers often use regular expressions to first break the input into chunks (spaces, identifiers, punctuation, quoted strings) and then use a grammar to turn that stream of chunks into a tree form.
The rule of thumb for CF grammars is
If regular expressions are insufficient but all words in the language have the same meaning regardless of prior declarations then CF works.
Non context free
If words in your language change meaning depending on context, then you need a more complicated solution. These are almost always hand-coded solutions.
For example, in C,
#ifdef X
typedef int foo
#endif
foo * bar
If foo is a type, then foo * bar is the declaration of a foo pointer named bar. Otherwise it is a multiplication of a variable named foo by a variable named bar.
It should be Regular Expression AND String Parsing..
You can use both of them to your advantage!Many a times programmers try to make a SINGLE regular expression for parsing a text and then find it very difficult to maintain..You should use both as and when required.
The REGEX engine is FAST.A simple match takes less than a microsecond.But its not recommended for parsing HTML.

Lua string.match uses irregular regular expressions?

I'm curious why this doesn't work, and need to know why/how to work around it; I'm trying to detect whether some input is a question, I'm pretty sure string.match is what I need, but:
print(string.match("how much wood?", "(how|who|what|where|why|when).*\\?"))
returns nil. I'm pretty sure Lua's string.match uses regular expressions to find matches in a string, as I've used wildcards (.) before with success, but maybe I don't understand all the mechanics? Does Lua require special delimiters in its string functions? I've tested my regular expression here, so if Lua used regular regular expressions, it seems like the above code would return "how much wood?".
Can any of you tell me what I'm doing wrong, what I mean to do, or point me to a good reference where I can get comprehensive information about how Lua's string manipulation functions utilize regular expressions?
Lua doesn't use regex. Lua uses Patterns, which look similar but match different input.
.* will also consume the last ? of the input, so it fails on \\?. The question mark should be excluded. Special characters are escaped with %.
"how[^?]*%?"
As Omri Barel said, there's no alternation operator. You probably need to use multiple patterns, one for each alternative word at the beginning of the sentence. Or you could use a library that supports regex like expressions.
According to the manual, patterns don't support alternation.
So while "how.*" works, "(how|what).*" doesnt.
And kapep is right about the question mark being swallowed by the .*.
There's a related question: Lua pattern matching vs. regular expressions.
As they have already answered before, it is because the patterns in lua are different from the Regex in other languages, but if you have not yet managed to get a good pattern that does all the work, you can try this simple function:
local function capture_answer(text)
local text = text:lower()
local pattern = '([how]?[who]?[what]?[where]?[why]?[when]?[would]?.+%?)'
for capture in string.gmatch(text, pattern) do
return capture
end
end
print(capture_answer("how much wood?"))
Output: how much wood?
That function will also help you if you want to find a question in a larger text string
Ex.
print(capture_answer("Who is the best football player in the world?\nWho are your best friends?\nWho is that strange guy over there?\nWhy do we need a nanny?\nWhy are they always late?\nWhy does he complain all the time?\nHow do you cook lasagna?\nHow does he know the answer?\nHow can I learn English quickly?"))
Output:
who is the best football player in the world?
who are your best friends?
who is that strange guy over there?
why do we need a nanny?
why are they always late?
why does he complain all the time?
how do you cook lasagna?
how does he know the answer?
how can i learn english quickly?

How to write the regex for this expression

I want to match strings like this: !! so I suppose the input have the right elements but whether the they are evaluable, that is left for the evaluator!
1+(2-3)*(4/5)
what is the regex for matching this, something like this: ([0-9\+-\*/\(\)]+)? but this seems not working.
If you are only asking for a character validation, you can use
^[0-9+*/()-]*$
You don't need to escape characters in a character class (inside square brackets). And if you must include an hyphen, you HAVE to put it at the end, otherwise it would be considered as the character range operator.
That said, keep in mind this will only guarantee you that you have no other characters. It will NOT validate the structure (regexes are not the right tool for that). However, since you stated an evaluator will then process the input, that might be right for you.
You can't, this is not a regular language. Though some regexp implementations may provide additional features to match balanced parenthesis.
Regular expressions can not match arbitrary arithmetic formulas. Regexps only describe regular languages, while arithmetic formulas use a recursive grammar. See http://en.wikipedia.org/wiki/Regular_expression#Formal_language_theory
A regex may be possible if you limit nesting depth, but if you want it all the way, with matching bracket detection, it will probably be very, very complicated.
If you want to match "1+(2-3)*(4/5)", then you can use this regular expression.
/1+\(2-3\)\*\(4\/5)/
What's that? That doesn't tell you what you want to know? Well, then what do you want to know? What information are you trying to extract from the string?
You can't just say "strings like this". Your question is not nearly enough clear.
If your question is to evaluate if a equation is valid then you will need a parser to Tokenize the expression than a grammar to evaluate if the expression is right.
You cant check if the equation as balanced parenthesis using regex. This is because a regular expression is equivalent to a Deterministic Finite Automata. Since the automata is finite, you will never have a automata big enough to check parenthesis.

Regular expressions Lexical Analysis

Why repeated strings such as
[wcw|w is a string of a's and b's]
cannot be denoted by regular expressions?
pls. give me detailed answer as i m new to lexical analysis.
Thanks ...
Regular expressions in their original form describe regular languages/grammars. Those cannot contain nested structures as those languages can be described by a simple finite state machine. Simplified you can picture that as if each word of the language grows strictly from left to right (or right to left), where repeating structures have to be explicitly defined and are static.
What this means is, that no information whatsoever from previous states can be carried over to later states (a few characters further in the input). So if you have your symbol w you can't specify that the input must have exactly the same string w later in the sequence. Similarly you can't ensure that each opening paranthesis needs a closin paren as well (so regular expressions themselves are not even a regular language and thus cannot be described by regular expressions :-)).
In theoretical computer science we worked with a very restricted set of regex operators, basically only consisting of sequence, alternative (|) and repetition (*), everything else can be described with those operations.
However, usually regex engines allow grouping of certain sub-patterns into matches which can then be referenced or extracted later. Some engines even allow to use such a backreference in the search expression string itself, thereby allowing the expression to describe more than just a regular language. If I remember correctly such use of backreferences can even yield languages that are not context-free.
Additional pointers:
This StackOverflow question
Wikipedia
It can be, you just can't assure that it's the same string of "a"s and "b"s because there's no way to retain the information acquired in traversing the first half for use in traversing the second.