Techniques for simplifying a regular expression **by hand** [closed] - regex

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm trying to prove that the RE
s(bs)*[(abb*a*)*ab]*aa(a∪b) where s=b*a*ab
can be simplified to
(a∪b)*abaa(a∪b),
but all the general transformations I know such as (ab)*a=a(ba)* or (a*b)*a*=(a∪b)* seem to have no effect.
So the questions are:
Is a step-by-step transformation process here viable? If so, can you show me how it is done?
If other types of technique are needed for the proof, what are they?
Is there a clear & comprehensive reference explaining techniques used in simplifying formal RE's?
Thanks.

There are lots of ways to do this.
One simple way to do it is to convert the regular expressions to NFAs. To see if two NFAs recognize the same language:
Consider the starting state of both NFAs.
Take the epsilon closure of the states you are considering for each NFA.
If the set of states for one NFA contains at least one accepting state but the set for the other NFA contains no accepting states, then the NFAs are not equal and you are done.
Otherwise, for each symbol, follow that symbol to get a new set of states for each NFA, and go back to step 1.
This will take a finite number of steps because there are a finite number of sets of states to consider.
You could also convert both regular expressions to minimal DFAs and show that they are isormorphic, but this technique is really just the same thing as the above technique but with the steps in a different order.
Illustration
Consider the REs aa* and a*a. It's obvious that they recognize the same language, but we'll use them to illustrate the algorithm.
First we consider the starting states for each NFA. These are {1} for the left RE and {1} for the right RE, which we will write as {1},{1}.
Then we take the epsilon closure, which is {1},{1,2}.
Neither set contains an ending state, so we continue.
The outgoing symbols are only a, so we take all a transitions to get {2},{1,3}.
The epsilon closure is {2,3},{1,2,3}.
Both contain ending states, so we continue.
Again, the outgoing symbols are a, so we take all a transitions to get {2},{1,3}.
The epsilon closure is {2,3},{1,2,3}. This is the same as step #5, so we don't do it again.
Since there is nothing else to examine, we are done, and the two have been proven identical.

Related

how to parse mathematical functions in C++ [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Can you give me some ideas about how can I make a simple mathematical expression parser in C?
User enters a mathematical function in a string and from the string I want to create the function in C.
eg. x + sin(2*x)
-> return x + sin(2x);
Thanks in advance.
You can parse the expression based "Shunting-Yard Algorithm" http://en.wikipedia.org/wiki/Shunting-yard_algorithm. You will need to extend to handle the function calls such as sin, cos etc...
This is not a simple thing to do at all, in face, it's a hard thing. You need a full grammar parser, combined with pre-defined constants/functions (sin, log, pi, etc).
If you have no extensive previous experience with C I would disrecommend doing this, but if you really want to do this look at recursive descent parsing which is arguably the easiest way to do this (without putting a burden on the user, like reverse polish notation).
Last but not least you say you want to create a C function from the user-generated input. This is almost always a wrong thing to do - generating code from user input, instead the easiest approach is pre-processing to create a intermediate representation that can be efficiently executed.
Writing an expression parser and evaluator is one of the usual examples used when discussions parser writing techniques.
For example you could look the documentation for flex/bison or lex/yacc. That will have examples of constructing parsers/expression evaluators.
One way to do it is to use reverse polish notation for the expressions and a stack for the operands.
Some quick pseudo-code:
if element is operand
push in stack
else if element is operation
pop last 2 elements
perform operation
push result in stack
Repeat till end of expression. Final result is the only element in stack.

Right sequence of brackets [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Please help with writing program on C++. We have a sequence of brackets. It consists from 4 kinds - (), [], {}, <>. Required to find the shortest sequence with the right placement of brackets, for which the initial sequence would be a subsequence, i. e. would be obtained from the resulting correct sequence by deleting some (possibly zero) number of brackets.
Example:
initial sequence <]}} {([])
the answer: <[] {} {} ([]) <>>
Your proposed answer doesn't seem to fit the requirements. For example, it doesn't look (at least to me) like you can generate the }}{ sequence by deleting elements from <[] {} {} ([]) <>>. You also seem to have a completely unnecessary pair of angle brackets. Presumably, your intent is also that the brackets in the generated sequence are balanced--otherwise, the correct answer is to simply leave the original sequence unchanged. With no other requirements, that's clearly the shortest sequence from which you can generate that sequence by deleting (zero) items.
If the requirement for balancing is correct, it looks like your original input has four possible correct results:
<[]{}{}{([])}>
<[]{}{}{}([])>
<>[]{}{}{}([])
<>[]{}{}{([])}
All these are the same length, so I don't see a particular reason to prefer one over the other. This looks enough like homework that I'm not going just give a direct solution to the problem, but I think the simplest code you could write for the job would probably produce the first of these four solutions (and that may provide at least some guidance about how I'd solve the problem).
I'm reasonably certain this can be done entirely using counters--shouldn't need any sort of "context stacks" (though a stack-based solution is certainly possible as well).

Regular Expression - At most One Repeated Digit [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I am struggling with a homework problem. I have tried this problem for hours literally. I found a similar question here, but it is not exactly my problem.
The homework problem says 1. (20 points) Construct regular expressions for the following languages.
a) All strings of digits with at most one repeated digit.
The only way I see how this is possible would be to exhaustively somehow take care of every possible case. There are 10 different digits, so it's like A LOT of different cases. I think the max length string can be 11, because after 11, you have to have a second repeated digit. So the number of possible combinations is 10^11. I thought even about writing a DFA and just converting it to a regex, but even that seems like it's impossible.
Does anyone have any advice? We aren't allowed to use non-standard regex features, like groups, lookahead, etc. This is just a plain old regex kind of problem.
Response to comment:
It is not binary. I already asked the teacher.
"Commenters, “regular expression” has one well-defined meaning in computer science. Since this is homework, it’s almost certainly that which is meant (and even more so as it talks of “languages”), and not some specific library. There’s no ambiguity here, and no clarification needed." This is basically what we want. The standard regex stuff often used in theoretical CS classes. As far as what we learned in class, I go to USC if anyone is familiar with that and we only barely talked about this at all. We're onto a completely different topic now.

Library to check if two regular expressions are equal/isomorphic [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need a library which will take in two regular expressions and determine whether they are isomorphic (i.e. match exactly the same set of strings or not)
For example a|b is isomorphic to [ab]
As I understand it, a regular expression can be converted to an NFA which in some cases can be efficiently converted to a DFA. The DFA can then be converted to a minimal DFA, which, if I understand it correctly, is unique and so these minimal DFA's can then be compared for equality. I realize that not all regular expression NFA's can be efficently transformed into DFA's (especially when they were generate from Perl Regexps which are not truly "regular") in which case ideally the library would just return an error or some other indication that the conversion is not possible.
I see tons of articles and academic papers on-line about doing this (and even some programming assignments for classes asking students to do this) but I can't seem to find a library which implements this functionality. I would prefer a Python and/or C/C++ library, but a library in any language will do. Does anyone know if such a library? If not, does someone know of a library that gets close that I can use as a starting point?
Haven't tried it, but Regexp:Compare for Perl looks promising: two regex's are equivalent if the language of the first is a subset of the second, and vice verse.
The brics automaton library for Java supports this.
It can be used to convert regular expressions to minimal Deterministic Finite State Automata, and check if these are equivalent:
public static void isIsomorphic(String regexA, String regexB) {
Automaton a = new RegExp(regexA).toAutomaton();
Automaton b = new RegExp(regexB).toAutomaton();
return a.equals(b);
}
Note that this library only works for regular expressions that describe a regular language: it does not support some more advanced features, such as backreferences.

Regex Performance Optimization Tips and Tricks [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
After reading a pretty good article on regex optimization in java I was wondering what are the other good tips for creating fast and efficient regular expressions?
Use the non-capturing group (?:pattern) when you need to repeat a grouping but don't need to use the captured value that comes from a traditional (capturing) group.
Use the atomic group (or non-backtracking subexpression) when applicable (?>pattern).
Avoid catastrophic backtracking like the plague by designing your regular expressions to terminate early for non-matches.
I created a video demonstrating these techniques. I started with the very poorly written regular expression in the catastrophic backtracking article (x+x+)+y. And then I made it 3 million times faster after a series of optimizations, benchmarking after every change. The video is specific to .NET but many of these things apply to most other regex flavors as well:
.NET Regex Lesson: #5: Optimization
Use the any (dot) operator sparingly, if you can do it any other way, do it, dot will always bite you...
i'm not sure whether PCRE is NFA and i'm only familiar with PCRE but + and * are usually greedy by default, they will match as much as possible to turn this around use +? and *? to match the least possible, bear these two clauses in mind while writing your regexp.
Know when not to use a regular expression -- sometimes a hand coded solution is more efficient and more understandable.
Example: suppose you want to match an integer that's evenly divisible by 3. It's trivial to design a finite state machine to accomplish this, and therefore a corresponding regex must exist, but writing it out is not so trivial -- and I'd sure hate to have to debug it!