Converting Epsilon-NFA to NFA - state

I'm having trouble understanding the process of converting an epsilon-NFA to a NFA, so I wondered if anybody could help me with it:
And the answer says:
The 0 in the new NFA has an A going to 1,2 and to 2. I figured this is because the 0 in the Epsilon NFA leads to 1 and 2 with an A (combined with an Epsilon). So why doesn't the 1,2 have an A-step going to 2, because in the Epsilon NFA the 1 has an A-step to 1 and 2?

Whenever you remove an ε from the NFA, you should be careful at the time of conversion for the direction of ε transition.
In your case, the ε transition is from node 1 to node 2, which is an
accept state. So, you need to consider all the incoming transitions to
the state 1.
Also, as {1} moves to {2} upon ε-transition, so 1 can also be reduced to {1,2} and it'll be an accept state. Check this question to know why this happens.
So, for removal of ε-transition, check all the incoming transitions to state 1, replace {1} with accept state {1,2} and convert them :-
State 0 transits to state 1 when it reads a, and state 1 will automatically transit to state 2 as it reads ε.
So, you should omit this path from 1 to 2(of ε-transition), and say that state 0 on reading a transits to both {1} and {2}. So, only 1 transition will be added to the exisitng NFA as
{0} -> {2} (on reading a) // should be drawn, not given
{0} -> {1} (on reading a) // this is already given
State 2 transits to state 1 when it reads a, and state 1 will automatically transit to state 2 as it reads ε.
So, you should omit this path from 1 to 2(of ε-transition), and say that state 2 on reading a transits to both {1} and {2}, itself. So, only 1 transition will be added to the exisitng NFA as
{2} -> {2} (on reading a) // a self-loop, should be drawn, not given
{2} -> {1} (on reading a) // this is already given
Please take special care that you replace the state {1} with the
accept state {1,2} because of the reason explained above.
There are no more incoming arrows directed to state 1 and hence all the dependencies are resolved. The new NFA matches your given NFA as the answer.

Related

Formulation of language and regular expressions

I can't figure out what is the formal language and regular expression
of this automaton :
DFA automaton
I know that the instance of 'b' or 'a' have to be even.
At first I thought the language was:
L = {(a^i)(b^j) | i(mod2) = j(mod2) = 0, i,j>=0}
But the automaton can start from 'b', so the language is incorrect.
also, the regular expression i found, isn't match either ((aa)* + (bb)) -
can't get abab for example.
The regex I got by progressively ripping out nodes (order: 3,1,2,0) is:
(aa|bb|(ab|ba)(bb|aa)*(ab|ba))*
As far as I can tell, that's the simplest it goes. (I'd love to know if anyone has a simpler reduction—I'm actually taking a test on this stuff this week!)
Step-by-step process
We start off by adding a new start and accept state. Every old accept state (in this case, there's only one) gets linked to the new accept state with an ε transition:
Next, we rip out state 3. We need to preserve all paths that run through state 3. In this case we've added a path from state 0 back to itself, paths from state 0 to state 2, and state 2 back to itself:
We do the same with state 1:
We can simplify this a bit: we'll concatenate the looping-back transitions with commas. (At the end, this will turn into the union operator (| or ⋃ etc. depending on your notation.)
We'll remove state 2 next, and get everything smooshed onto one big loop:
Loops become stars; we remove the last state so we just have a transition from the start state to the end state connected with one big regular expression:
And that's our regular expression!
Language definition
You're pretty close with the language definition. If you can allow something a little looser, it would be this:
L = { w | w contains an even number of 'a's and 'b's }
The problem with your definition is that you start the string w off with a every time, whereas the only restriction is on the parity of the number of a's and b's.

RE to NFA Thompson's construction steps ((c|a)b*)*

I tried to convert ((c|a)b*)* to an nfa by using thompsom's construction but I have understood something wrong because the outcome isn't the one it is supposed to be. I would be really glad if you could point my mistake.
Thompson's construction rules:
1)Every NFA has a start state and an accepting state.
2)No transition, except the starting one, is allowed to enter the start state.
3)No transition exits from an accepting state.
4)An ε-transition always connects 2 states that used to be start or accepting states for some REs
5)A state can have at maximum 2 incoming and 2 exiting ε-transitions
6)A state can at maximum 1 incoming and 1 exiting transition for a specific character of the alphanumerics used.
Step 1: I created NFAs for each character
Step 2: parenthesis have priority so I created c|a
Step 3: then I created b*
Step 4: then I combined c|a and b* to create (c|a)b*
Step 5: and at last I created ((c|a)b*)*
The difference from the correct solution is that in the last nfa (the example doesn't show the steps and the states got renumbered in the end) there is no s9. So S8 ε-transists to S5 and S5 ε-transists to S10. Which makes sense to me if b* didn't have the S9 state but it needs it because of rule number 2. So I guess I made a mistake during the connection. Thank you in advance.
Rule 2 says that nothing can enter S11, which isn't relevant here. When concatenating (step 4), S8 and S9 should have been combined.
From Wikipedia,
The concatenation expression st is converted to

Steps to draw a DFA (or NFA) from a simple statement?

I am given a simple statement: Construct a DFA over alphabet {0, 1} that accepts all the strings that end in 101?
My question is that what will be the steps to design it? Or design an NFA, because then I know the clear steps yo convert an NFA to a DFA, so I will then convert the NFA to the DFA.
Note:- It is just a minor course for me, so I have never studied anything like regular expressions, or any algorithms probably used to construct DFA's.
If you want more of an explanation on how I derived this, I'd be happy to explain, but for now I just drew the DFA and explained each state.
Sorry about the screenshot...I didn't know how to convert it straight to an image.
On input 0 at state 0, it loops back to itself. On 1, it prepares
itself to end because it could possibly be '101'.
q1 loops to itself on input 1 because it's still preparing to end on
'101'. Input '0' on q1 means it is preparing for input '10', so it goes to q2.
Input '0' on q2 breaks the whole cycle and goes back to q0. Input '1'
results in moving to q3, the accepting state.
Any input on q3 results in going back to whatever point in the cycle
the input corresponds with.
That is, on '1' it goes back to q1, or the state where the first '1'
was encountered in '101', preparing to end.
On '0', it goes to q2 because in order to get to q3, there must have
been an input of '1' from q2, so no matter what, the last two input
symbols are '10' now.
TikZ DFA examples.
Here,the string should end with 101.So we need to draw nfa for it and later convert it into DFA
Here the total states are A,B,C,D.
I will upload an image here. In that I have drawn NFA and then I have drawn transition table for it.
And then I have drawn transition table for conversion of NFA to DFA.
I also drawn DFA for your sake.
In NFA, when a specific input is given to the current state, the machine goes to multiple states. It can have zero, one or more than one move on a given input symbol. On the other hand, in DFA, when a specific input is given to the current state, the machine goes to only one state. DFA has only one move on a given input State.
THE STEPS FOR CONVERTING NFA TO DFA:
Step 1: Initially Q' = ϕ
Step 2: Add q0 of NFA to Q'. Then find the transitions from this start state.
Step 3: In Q', find the possible set of states for each input symbol. If this set of states is not in Q', then add it to Q'.
Step 4: In DFA, the final state will be all the states which contain F(final states of NFA)
View the image here
Click here

How to implement regular expression NFA with character ranges?

When you read such posts as Regex: NFA and Thompson's algorithm everything looks rather straightforward until you realize in real life you need not only direct characters like "7" or "b", but also:
[A-Z]
[^_]
.
namely character classes (or ranges). And thus my question -- how to build NFA using character ranges? Using meta-characters like "not A", "anything else" and then computing overlapping ranges? This would lead to using tree-like structure when using final automaton, instead of just a table.
Update: please assume non-trivial in size (>>256) alphabet.
I am asking about NFA, but later I would like to convert NFA to DFA as well.
The simplest approach would be:
Use segments as labels for transitions in both NFA and DFA. For example, range [a-z] would be reperesented as segment [97, 122]; single character 'a' would become [97,97]; and any character '.' would become [minCode, maxCode].
Each negated range [^a-z] would result in two transitions from starting state to next state. In this example two transitions [minCode, 96] and [123, maxCode] should be created.
When range is represented by enumerating all possible characters [abcz], either transition per character should be created, or the code migh first group characters into ranges to optimize the number of transitions. So the [abcz] would become [a-c]|z. Thus two transitions instead of four.
This should be enough for NFA. However the classical power set construction to transform NFA to DFA will not work when there are transitions with intersecting character ranges.
To solve this issue only one additional generalization step is required. Once a set of all input symbols created, in our case it will be a set of segments, it should be transformed into a set of non-intersecting segments. This can be done in time O(n*Log(n)), where n is a number of segments in a set using priority equeue (PQ) in which segments are ordered by the left component. Example:
Procedure DISJOIN:
Input <- [97, 99] [97, 100] [98, 108]
Output -> [97, 97] [98, 99], [100, 100], [101, 108]
Step 2. To create new transitions from a "set state" the algorithm should be modified as following:
for each symbol in DISJOIN(input symbols)
S <- empty set of symbols
T <- empty "set state"
for each state in "set state"
for each transition in state.transitions
I <- intersection(symbol, transition.label)
if (I is not empty)
{
Add I to the set S
Add transition.To to the T
}
for each segement from DISJOIN(S)
Create transition from "set state" to T
To speed up matching when searching for a transition and input symbol C, transitions per state might be sorted by segments and binary search applied.

2 digits only allowed once (Regex)

I'm trying to check if a level is valid or not.
The level is of the form: (but they're 998 more of these)
bbbbbbb
b41111b
b81400b
b81010b
b01121b
b08001b
bbbbbbb
The level must follow a few rules. I have written a regex to conform all rules but one:
The level must contain exactly 1 times 2 and 1 times 4.
(Notice in the level above there's two 4's and one 2. The level above is not valid.)
This is a school project so please guide me through to the answer.
Thanks in advance.
EDIT:
My current regex is:
^b{' + str(length) + r'}\n(b{1}[0-8]{' + str(length - 2) + r'}b{1}\n)+b{' + str(length) + '}$
For the level above, length = 7
Note that it doesn't even try to filter this wrong level above.
The other rules are:
The level must be surrounded by a 'b'
The level can only contain the char 'b' and numbers smaller than 9.
There can only be one 2
There can only be one 4
My regex above does take rules 1 and 2 into account, but I still need to figure out rules 3 and 4.
I have tried lookarounds and such, couldn't figure it out.
This is the regex that will meet all of your conditions:
^b(?!(?:[^2]*2){2,})(?!(?:[^4]*4){2,})[b0-8]*b$
It starts and end with b Using ^b and $b
It comprises of only letter b and numbers 0-8 by[b0-8]*
It won't allow more than one digit 2 by using (?!(?:[^2]*2){2,})
It won't allow more than one digit 4 by using (?!(?:[^4]*4){2,})
Well, a regex for exactly 1 a would be [^a]*a[^a]* (that is, a possibly empty sequence of non-a's, followed by an a, followed by a possibly empty sequence of non-a's). I'll leave it an an exercise how to handle multiple lines & making sure this covers the whole level.
For exactly 1 a and 1 b: [^ab]*((a[^ab]*b)|(b[^ab]*a))[^ab]*, with the same caveats. Explanation: a sequence of non-a-or-b's, follow by EITHER 1) an a, a run of non-a-or-b's, and a b, or 2) a b, a run of non-a-or-b's, and an a, with THAT followed by a run of non-a-or-b's.
The answer is to use negative lookaheads anchored to start of input.
It's unclear what you're trying to match, so I will just use a placeholder <your-regex> for your current regex:
^(?!.*?2.*?2)(?!.*?4.*?4)<your-regex>
See a live demo of rhis correctly rejecting more than 1 "2".