NFA to an RE Kleene's Theorem - regex

Here is my NFA:
Here is my attempt.
Create new start and final nodes
Next eliminate the 2nd node from the left which gives me ab
Next eliminate the 2nd node from the right which gives me ab*a
Next eliminate the 2nd node from the left which gives me abb*b
Next eliminate the 2nd node from the right which gives me b+ab*a
Which leads to abbb (b+aba)*
Is this the correct answer?

No you are not correct :(
you not need to create start state. the first state with - sign is the start state. Also a,b label means a or b but not ab
there is a theorem called Arden's theoram, will be quit helpful to convert NFA into RE
What is Regular Expression for this NFA?
In you NFA the intial part of DFA:
step-1:
(-) --a,b-->(1)
means (a+b)
step-2: next from stat 1 to 2, note state 2 is accepting state final (having + sign).
(1) --b--->(2+)
So you need (a+b)b to reach to final state.
step-3: One you are at final state 2, any number of b are accepted (any number means one or more). This is because of self loop on state 2 with label b.
So, b* accepted on state-2.
step-4:
Actually there is two loops on state-2.
one is self loop with label b as I described in step-3. Its expression is b*
second loop on state-2 is via state-3.
the expression for second loop on state-2 is aa*b
why expression aa*b ?
because:
a-
|| ====> aa*b
▼|
(2+)--a-->(3) --b-->(2+)
So, In step-3 and step-4 because of loop on state-2 run can be looped back via b labeled or via aa*b ===> (b + aa*b)*
So regular expression for your NFA is:
(a+b) b (b + aa*b)*

Related

Regular expression - Kleene star of a union expression

I'm trying to code something that returns randomly a possible result after going through a regular expression.
I was sort of confused on how to tackle this when you have kleene star of a union expression.
If you have (a + b)* then does this mean that you indefinitely choose between a or b and repeat it a definite number of times, or do you just randomly choose between a or b twice.
If it is the former, then would it logically make sense to first generate a random number to determine how many times I'm going to randomly choose between a or b, and then for each time I randomly choose the element I generate another random number that then repeats the element that many times?
If you're asking what kind of things match (a | b)*, you might as well think of it in terms of a grammar:
<expression> := <empty> | <parens><expression>
<parens> := a | b
That's what a * operator really means: for any expression x, x* matches either the empty string or (x)(x*) (this is a recursive definition).
If you want to randomly generate a string that matches the expression, then that's a much more complicated matter. You now have to think in terms of which distribution you want to use, because the length of the string is unbounded, and it's impossible to have a uniform distribution over an unbounded range. (In other words, you can't pick a random length between 0 and infinity uniformly, so you'd have to decide how you're going to pick that in the first place.) Once you have your length problem resolved, expand (a | b)* into (a | b) repeated N times (where N is your randomly-chosen length) and resolve each parenthesized subexpression separately — for instance, if you choose to expand the subexpression 3 times, that would become (a | b)(a | b)(a | b), which will match all of aaa, baa, aba, bba, aab, bab, abb and bbb.
If you want to test if the string is a member of a Kleene star applied
set, such as:
{"a", "b"}* = {ε, "a", "b", "aa", "ab", "ba", "bb", "aaa", "aab", ...}
then the regex ^[ab]*$ will work including an empty string.
If you want to limit the length of the string, say 10, then try ^[ab]{,10}$.

Walkthrough of regex match

Could someone help me understand how a regex engine matches the following:
a(bc)*
Against the text: abc.
For example, how many steps does it take? What happens at each step? For example, something like:
The first step is to match the letter "a" from the regex against the "a" in the text "abc". Because this is not optional/repeated there is no backtrack stored at this position.
Ideally, the regular expression (if it is a true regular expression) is first converted to a graph representation of an NFA (non-deterministic finite automaton), perhaps something like this:
a(bc)*:
(0)-- a --> (1) ---b--> (2) -- ε --> ((3))
^ |
`-----c----'
0 is the start state; ((3)) is the acceptance state. ε is an empty transition without consuming input.
An NFA can be executed directly by the NFA simulation algorithm.
It can also be compiled to a DFA (deterministic F. A.) using the "subset construction". The states of the DFA correspond to sets of the original NFA states. We end up with something like this:
DFA state NFA States Input Next State
--------------------------------------------
0 { 0 } a 1
1 { 1 } b 2
2 (accept) { 2, 3 } c 1
State 2 of the DFA corresponds to two states of the NFA: when the DFA is instate 2, the corresponding NFA simulator has to be in states 2 and 3 simultaneously, because 3 is reachable via an epsilon transition (no input symbol consumed). The DFA state 2 is an acceptance state because the NFA set it corresponds to { 2, 3 } contains an acceptance state.
The DFA requires very few steps; basically we just read characters and dispatch to the next state in the table based on the current state and the input character. If we are not able to dispatch, then there is a mismatch; we can stop reading more input. If we process the entire input, and are left in an acceptance state, then there is a match.

Regular Expression to DFA

Can someone tell me if the attached DFA is correct ?
I am suppose to give DFA for the language that has alphabet Σ ={a, b}
I need DFA for this ----> A={ε, b, ab}
No, for multiple reasons:
Your automaton bab
Your automaton does not accept ab
Your automaton is not a DFA, at least by some strict definitions
Regarding the first point: starting at q1, we see b, go to q2, see a, go to q3, see b, and go to q4, which is accepting. We saw bab and accepted it.
Regarding the second point: starting at q1, we see a but have no defined transition. The automaton "crashes" and fails to accept. So no string starting with a is accepted, including ab.
Regarding the third point: DFAs are often required to show all states and transitions, including dead states and transitions that will never lead back to any accepting state. You don't show all transitions and don't show all states in your automaton.
You can use the Myhill-Nerode theorem to determine how many states a minimal DFA for your language has. We note that the empty state can have appended either the empty string, b or ab to get a string in the language; a can have b appended; and b can have the empty string appended. Nothing can be appended to aa, bb, or ba to get a string in the language (so these are indistinguishable); but ab can have the empty string appended (and so is indistinguishable from b).
Equivalence classes so determined correspond to states in a minimal DFA. Our equivalence classes are:
Strings like the empty string
Strings like b
Strings like a
Strings like aa
We note that b is in the language, so the second class will correspond to an accepting state. We notice nothing can be appended to aa to get a string in the language, so this class corresponds to a dead state in the DFA. We write the transitions between these states by seeing which new equivalence class the appending of a new symbol puts us in:
Appending a puts us in (3) since appending a to the empty string gives a which is in (3). Appending b puts us in (2) since appending b to the empty string gives b which is in (2)
Appending a puts us in (4) since appending a to to b gives ba which is like aa in that it isn't a prefix of any string in the language. Appending b, we arrive in (4) by a similar argument.
Appending a we get aa and are in (4). Appending b we get ab which is like b so we are in (2).
All transitions from a dead state return to a dead state; both a and b lead back to (4).
You end up with something like:
q1 --a--> q3
| /|
b --b--< a
| / |
vv v
q2 -a,b-> q4 \
^ a,b
\_/
Or in tabular form:
q s q'
== = ==
q1 a q3
q1 b q2
q2 a q4
q2 b q4
q3 a q4
q3 b q2
q4 a q4
q4 b q4
i think this DFA is correct for that language.
Your attached D.F.A is wrong..
your D.F.A is acceptable only for €,b,bab but it cannot accept ab.
To make your dfa to accept ab also add a new state to q0 which accepts a and whenever newstate gets input as b send it to a final state.
As it is a d.f.a the inputs which are not required for u send it to a new state (DEAD STATE)
The d.f.a for your question is here:
click here to view the d.f.a

simulate a deterministic pushdown automaton (PDA) in c++

I was reading an exercise of UVA, which I need to simulate a deterministic pushdown automaton, to see
if certain strings are accepted or not by PDA on a given entry in the following format:
The first line of input will be an integer C, which indicates the number of test cases. The first line of each test case contains five integers E, T, F, S and C, where E represents the number of states in the automaton, T the number of transitions, F represents the number of final states, S the initial state and C the number of test strings respectively. The next line will contain F integers, which represent the final states of the automaton. Then come T lines, each with 2 integers I and J and 3 strings, L, T and A, where I and J (0 ≤ I, J < E) represent the state of origin and destination of a transition state respectively. L represents the character read from the tape into the transition, T represents the symbol found at the top of the stack and A the action to perform with the top of the stack at the end of this transition (the character used to represent the bottom of the pile is always Z. to represent the end of the string, or unstack the action of not taking into account the top of the stack for the transition character is used <alt+156> £). The alphabet of the stack will be capital letters. For chain A, the symbols are stacked from right to left (in the same way that the program JFlap, ie, the new top of the stack will be the character that is to the left). Then come C lines, each with an input string. The input strings may contain lowercase letters and numbers (not necessarily present in any transition).
The output in the first line of each test case must display the following string "Case G:", where G represents the number of test case (starting at 1). Then C lines on which to print the word "OK" if the automaton accepts the string or "Reject" otherwise.
For example:
Input:
2
3 5 1 0 5
2
0 0 1 Z XZ
0 0 1 X XX
0 1 0 X X
1 1 1 X £
1 2 £ Z Z
111101111
110111
011111
1010101
11011
4 6 1 0 5
3
1 2 b A £
0 0 a Z AZ
0 1 a A AAA
1 0 a A AA
2 3 £ Z Z
2 2 b A £
aabbb
aaaabbbbbb
c1bbb
abbb
aaaaaabbbbbbbbb
this is the output:
Output:
Case 1:
Accepted
Rejected
Rejected
Rejected
Accepted
Case 2:
Accepted
Accepted
Rejected
Rejected
Accepted
I need some help, or any idea how I can simulate this PDA, I am not asking me a code that solves the problem because I want to make my own code (The idea is to learn right??), But I need some help (Some idea or pseudocode) to begin implementation.
You first need a data structure to keep transitions. You can use a vector with a transition struct that contains transition quintuples. But you can use fact that states are integer and create a vector which keeps at index 0, transitions from state 0; at index 1 transitions from state 1 like that. This way you can reduce searching time for finding correct transition.
You can easily use the stack in stl library for the stack. You also need search function it could chnage depending on your implementation if you use first method you can use a function which is like:
int findIndex(vector<quintuple> v)//which finds the index of correct transition otherwise returns -1
then use the return value to get newstate and newstack symbol.
Or you can use a for loop over the vector and bool flag which represents transition is found or not.
On second method you can use a function which takes references to new state and new stack symbol and set them if you find a appropriate transition.
For inputs you can use something like vector or vector depends on personal taste. You can implement your main method with for loops but if you want extra difficulties you can implement a recursive function. May it be easy.

The Art of Computer Programming exercise question: Chapter 1, Question 8

I'm doing the exercises to TAOCP Volume 1 Edition 3 and have trouble understanding the syntax used in the answer to the following exercise.
Chapter 1 Exercise 8
Computing the greatest common divisor of positive integers m & n by specifying Tj,sj,aj,bj
Let your input be represented by the string ambn (m a's followed by n b's)
Answer:
Let A = {a,b,c}, N=5. The algorithm will terminate with the string agcd(m,n)
j Tj sj bj aj
0 ab (empty) 1 2 Remove one a and one b, or go to 2.
1 (empty) c 0 0 Add c at extreme left, go back to 0.
2 a b 2 3 Change all a's to b's
3 c a 3 4 Change all c's to a's
4 b b 0 5 if b's remain, repeat
The part that I have trouble understanding is simply how to interpret this table.
Also, when Knuth says this will terminate with the string agcd(m,n) -- why the superscript for gcd(m,n) ?
Thanks for any help!
Edited with more questions:
What is Tj -- note that T = Theta
What is sj -- note that s = phi
How do you interpret columns bj and aj?
Why does Knuth switch a new notation in the solution to an example that he doesn't explain in the text? Just frustrating. Thanks!!!
Here's an implementation of that exercise answer. Perhaps it helps.
By the way, the table seems to describe a Markov algorithm.
As far as I understand so far, you start with the first command set, j = 0. Replace any occurencies of Tj with sj and jump to the next command line depending on if you replaced anything (in that case jump to bj, if nothing has been replaced, jump to aj).
EDIT: New answers:
A = {a,b,c} seems to be the character set you can operate with. c comes in during the algorithm (added to the left and later replaced by a's again).
Theta and phi could be some greek character you usually use for something like "original" and "replacement", although I wouldn't know they are.
bj and aj are the table lines to be next executed. This matches with the human-readable descriptions in the last column.
The only thing I can't answer is why Knuth uses this notation without any explanations. I browsed the first chapters and the solutions in the book again and he doesn't mention it anywhere.
EDIT2: Example for gdc(2,2) = 2
Input string: aabb
Line 0: Remove one a and one b, or go to 2.
=> ab => go to 1
Line 1: Add c at extreme left, go back to 0.
=> cab => go to 0
Line 0: Remove one a and one b, or go to 2.
=> c => go to 1
Line 1: Add c at extreme left, go back to 0.
=> cc => go to 0
Line 0: Remove one a and one b, or go to 2.
No ab found, so go to 2
Line 2: Change all a's to b's
No a's found, so go to 3
Line 3: Change all c's to a's
=> aa
Line 4: if b's remain, repeat
No b's found, so go to 5 (end).
=> Answer is "aa" => gdc(2,2) = 2
By the way, I think description to line 1 should be "Remove one "ab", or go to 2." This makes things a bit clearer.
The superscript for gcd(m,n) is due to how numbers are being represented in this table.
For example: m => a^m
n => b^n
gcd(m,n) => a^gcd(m,n)
It looks to be like Euclids algorithm is being implemented.
i.e.
gcd(m,n):
if n==0:
return m
return gcd(n,m%n)
The numbers are represented as powers so as to be able to do the modulo operation m%n.
For example, 4 % 3, will be computed as follows:
4 'a's (a^4) mod 3 'b's (b^3), which will leave 1 'a' (a^1).
the notion of am is probably a notion of input string in the state machine context.
Such notion is used to refer to m instances of consecutive a, i.e.:
a4 = aaaa
b7 = bbbbbbb
a4b7a3 = aaaabbbbbbbaaa
And what agcd(m,n) means is that after running the (solution) state machine, the resulting string should be gcd(m,n) instances of a
In other words, the number of a's in the result should be equal to the result of gcd(m,n)
And I agree with #schnaader in that it's probably a table describing Markov algorithm usages.