Representing regular expressions with DFA state diagrams - regex

I have these regular expressions:
a) 0(10)*
b) a(b+c+d)
c) b*(aa*b*+ε)
I created these state diagrams for them:
Are they correct?

If I deciphered your writing correctly, this is what you wrote:
a) 0(10)*
Correct.
But two remarks:
Clearly indicate which state is the starting state. It is common to use an incoming arrow for that, like you did in answer to (c).
The rightmost two states are not really needed as they are indistinguishable from the first two states (at the top).
You could simplify to this:
b) a(b+c+d)
Not correct. There are these issues:
a is not a valid input, so the second state should not be an accepting state.
This first sink state lacks a transition for a. Once that is corrected, it becomes a state that is indistinguishable from the second sink, and so they could be merged
And you should indicate which is the starting state
Correction:
c) b*(aa*b*+ε)
Correct!

Related

Given Two Regex, Determine if One is a Complement of Other

I'd like to know how you can tell if some regular expression is the complement of another regular expression. Let's say I have 2 regular expressions r_1 and r_2. I can certainly create a DFA out of each of them and then check to make sure that L(r_1) != L(r_2). But that doesn't necessarily mean that r_1 is the complement of r_2 and vice versa. Also, it seems to be that many different regular expressions that could be the same complement of a single regular expression.
So I'm wondering how, given two regular expressions, I can determine if one is the complement of another. This is also new to me, so perhaps I'm missing something that should be apparent.
Edit: I should point out that I am not simply trying to find the complement of a regular expression. I am given two regular expressions, and I am to determine if they are the complement of each other.
Here is one approach that is conceptually simple, if not terribly efficient (not that there is necessarily a more efficient solution...):
Construct NFAs M and N for regular expressions r and s, respectively. You can do this using the construction introduced in the proof that finite automata describe the same languages.
Determinize M and N to get M' and N'. We might as well go ahead and minimize them at this point... giving M'' and N''.
Construct a machine C using the Cartesian product machine construction on machines M'' and N''. Acceptance will be determined by the symmetric difference, or XOR, criterion: accepting states in the product machine correspond to pairs of states (m, n) where exactly one of the two states is accepting in its automaton.
Minimize C and call the result C'
If L(r) = L(s)', then the initial state of C' will be accepting and C' will have all transitions originating in the initial state also terminating in the initial state. If this is the case,
Why should this work? The symmetric difference of two sets is the set of everything in exactly one (not both, not neither). If L(s) and L(r) are complementary, then it is not difficult to see that the symmetric difference includes all strings (by definition, the complement of a set contains everything not in the set). Suppose now there were non-complementary sets whose symmetric difference were the universe of all strings. The sets are not complementary, so either (1) their union is non-empty or (2) their union is not the universe of all strings. In case (1), the symmetric difference will not include the shared element; in case (2), the symmetric difference will not include the missing strings. So, only complementary sets have the symmetric difference equal to the universe of all strings; and a minimal DFA for the set of all strings will always have an accepting initial state with self-loops.
For complement: L(r_1) == !L(r_2)

How to derive the RegEx from the state diagram?

I found a state diagram of a DFA (deterministic finite automaton) with its RegEx in a script, but this diagram is just a sample without any explanations. So I tried to derive the RegEx from the DFA state diagram by myself and got the expression: ab+a+b(a*b)*. I dont understand how I get the original RegEx (ab+a*)+ab+ mentioned in the script. Here my derivation:
I am grateful for any help, links, references and hints!
You've derived the regex correctly here. The expression you have ab+a+b(a*b)* is equivalent to (ab+a*)+ab+ - once you've finished the DFA state elimination (you have a single transition from the start state to an accepting state), there aren't any more derivations to do. You may however get different final regexes depending on the order you eliminate the states in, and they should all be valid assuming you did the eliminations correctly. The state elimination method is also not guaranteed to be able to produce all equivalent regular expressions for a particular DFA, so it's okay that you didn't arrive at exactly the original regex. You can also check the equivalence of two regular expressions here.
For your particular example though to show that this DFA is equivalent to this original regex (ab+a*)+ab+, take a look at the DFA at this state of the elimination (somewhere in between the second and third steps you've shown above):
Let's expand our expression (ab+a*)+ab+ to (ab+a*)(ab+a*)*ab+. So in the DFA, the first (ab+a*) gets us from state 0 to partway between states 2 and 3 (the a* in a*a).
Then the next part (ab+a*)* means we're allowed to have 0 or more copies of (ab+a*). If there are 0 copies, we'll just finish with an ab+, reading an a from the second half of the a*a transition from 2 to 3 and a b from the 3 to 4 transition, landing us in state 4 which is accepting and where we can take the self loop and read as many b's as we want.
Otherwise we have 1 or more copies of (ab+a*), again reading an a from the second half of the a*a transition from 2 to 3 and a b from the 3 to 4 transition. The a* comes from the first half of the a*ab self loop on state 4 and the second half ab is either the final ab+ of the regex or the start of another copy of (ab+a*). I'm not sure if there's a state elimination that arrives at exactly the expression (ab+a*)+ab+ but for what it's worth, I think the regex you derived more clearly captures the structure of this DFA.

Do not understand empty string in finite automata from regular expression

I am reading my textbook ("Introduction to the theory of computation" by Michael Sipser) and there is a part that is trying to convert the regular expression (ab U b)* into a non-deterministic finite automata. The book is telling me to break the expression apart and write a NFA for each piece. When it reaches ab it shows:
An a to go from the first state to the second
An empty string to go from the second state to the third
A b to go from the third state to the final accepting state.
I am really confused trying to understand the second step. What's the purpose of the empty string. Would it be incorrect if it wasn't there?
You're probably making reference to example 1.56, right? First of all, remember he is describing a process - think of it as an algorithm, that will be executed always in the same matter. What he's saying is that, if you have a regexp concatenation as P.Q, no matter what P and Q are, you can join their AFNs by connecting the ending states of P with an empty string to the starting state of Q.
So, in his example, he's creating an AFN to represent a.b. AFNs to represent a and b isolated are trivial, as:
So, in order to join both, you just follow the algorithm, putting an empty string connection from the final state of the first AFN to the starting state of the second algorithm, as:
In this particular case, since it's very easy to create a a.b AFD, we look at it and perceive this empty string is not necessary. But the process works with every single concatenation - and, in fact, being self evident that this is equivalent of a trivially correct AFD just reinforces his statement: the process works.
Hope that helps.

How would I convert regular expression to finite automata?

How would I change the following regular expression to finite automata?
(abUb)(bUaaa)b*b((a*b)*Ub)*
note: U means union in this case
There are five top-level concatenated components of this regex. According to the algorithm recoverable from a part of Kleene's theorem, you can make NFA-Lambdas for these, then form the concatenation by connecting final states of one to initial states of the next.
When you see a union, that means you make two machines and combine them by making a new start state with two lambda transitions.
Kleene closure is a little more involved, but basically make the machine for the thing being repeated, then transform it by adding a new accepting start state and a loop to it from the old final states.
The base case is the machine for a single letter, which is two states, initial and final, with the appropriately labelled transition.
Work recursively from the simplest machines (innermost subexpressions) up to the whole thing, combining as necessary. Simplify the result as much as you like, possibly converting to a minimal DFA.
There is an application called Automatic Java Code Generator for Regular Expressions and Finite Automata, that automatically generates the NFA, DFA (including the transition table), and Java Code for a given regular expression or finite automata. It can be downloaded from this link: www.s-solutions.info You can always check if your solution is correct or not.

Regular expressions Equivalence

Is there a way to find out if two arbitrary regular expressions are equivalent? Looks like complex problem to me, but there might be some DFA simplification mechanism or something?
To test equivalence you can compute the minimal DFAs for the expressions and compare them.
Testability of equality is one of the classical properties of regular expressions. (N.B. This doesn't hold if you're really talking about Perl regular expressions or some other technically nonregular superlanguage.)
Turn your REs to generalised finite automata A and B, then construct a new automaton A-B such that the accepting states of A have null transitions to the start states of B, and that the accepting states of B are inverted. This gives you an automaton that accepts all those strings accepted by A, except for all those accepted by B.
Do the same for B-A, and reduce both to pure FAs. If an FA has no accepting states accessible from a start state then it accepts the empty language. If you can show that both A-B and B-A are empty, you've shown that A = B.
Edit Heh, I can't believe no one noticed the gigantic error there -- an intentional one, of course :-p
The automata A-B as described will accept those strings whose first half is accepted by A and whose second half is not accepted by B. Building the desired A-B is a slightly trickier process. I can't think of it off the top of my head, but I do know it's well-defined (and likely involves creating states to the represent the products of accepting states in A and non-accepting states in B).
This really depends on what you mean by regular expressions. As the other posters pointed out, reducing both expressions to their minimal DFA should work, but it only works for the pure regular expressions.
Some of the constructs used in the real world regex libs (backreferences in particular) give them power to express languages that aren't regular, so the DFA algorithm won't work for them. For example the regex : ([a-z]*) \1 matches a double occurence of the same word separated by a space (a a and b b but not b a nor a b). This cannot be recognized by a finite automaton at all.
These two Perlmonks threads discuss this question (specifically, read blokhead's responses):
Comparative satisfiability of regexps
Testing regex equivalence