How do I concatenate adjacent Kleene star symbols from an alphabet? - regex

I came across a situation where I need to convert regular expressions to NFA diagrams from the language {1,0}. Within the regex, I found that there are two concatenated symbols with Kleene stars, 1*0*. Basically this means that the string has any number of 1's followed by any number of 0's.
Whilst converting into an NFA, I got confused mainly because there are two transactions pointing outwards of the first symbol's (1*) accept state: an epsilon transaction back to the initial state (because it has a Kleene star), and an epsilon transaction to the initial state of 0*.
I am not sure whether 1) I can have two transactions leaving the same state when converting to an NFA and if so, 2) how to simplify this transaction.
Any help here would be appreciated!

You can definitely have multiple epsilon transitions from the same state.
Using https://en.wikipedia.org/wiki/Thompson%27s_construction,
Concatenation of s and t: the initial state of s is the new initial state, the accepting state of t is the new accepting state. The accepting state of s becomes the initial state of t.
Kleene closure of s: introduce a new initial state and a new accepting state. Add a epsilon transition from the initial state to the final state. Add an epsilon transition from the new initial to the original initial, and an epsilon transition from original accepting to new accepting, and an epsilon transition from original accepting to original initial.
So, our expression 1*0* breaks into: 1* concatenated with 0*.
1 on its own is just q --1--> f. Going through the Kleene conversion to NFA yields
/--------------e--------------\
| V
q --e--> q1 --1--> q1f --e--> f
^ |
\---e----/
With a similar construction for 0*. To concatenate them, take the accepting state from the first, and define it to be the starting state of the second:
/---------------e-------------\ /-------------e---------------\
| V | V
q --e--> q1a --1--> q1f --e--> q0 --e--> q0a --0--> q0f --e--> f
^ | ^ |
\-----e----/ \-----e----/
To simplify, you can convert it to an NFA or a DFA using their corresponding conversion algorithms.

Related

How to check valid Indian pancard using regex

I try to check Indian pancard number using regex.
Indian pan card format is 5 Alpha - 4 Numeric 0001-9999 - 1 Alpha
I tried following regex [A-Z]{5}[0-9]{4}[A-Z]{1}
It's broadly speaking working fine, but it's not exactly correct for the second condition -- it accepts 0000-9999, but I need from 0001 - 9999. How should I modify this regex?
You can use a negative lookahead to filter out the rules you don't want.
^(?!0+$)[0-9]{4}$
(?!0+$) means if the rest of following text is only made of 0s, fail the test.
See the proof
Edit
If it's not just the numbers, you can remove the ^ and $, and insert the negative lookahead (?!0{4}) before [0-9]{4}:
[A-Z]{5}(?!0{4})[0-9]{4}[A-Z]{1}
I found the answer by #TonyR
^[A-Z]{5}(?!0000)\d{4}[A-Z]{1}$
Simple check
As it just checks position of the characters
([A-Z]){5}([0-9]){4}([A-Z]){1}
More comprehensive
Checks the 4th letter
[a-zA-Z]{3}[PCHFATBLJG]{1}[a-zA-Z]{1}[0-9]{4}[a-zA-Z]{1}
The fourth character of the PAN must be one of the following, depending on the type of assessee:
C — Company
P — Person
H — Hindu Undivided Family (HUF)
F — Firm
A — Association of Persons (AOP)
T — AOP (Trust)
B — Body of Individuals (BOI)
L — Local Authority
J — Artificial Juridical Person
G — Govt

Walkthrough of regex match

Could someone help me understand how a regex engine matches the following:
a(bc)*
Against the text: abc.
For example, how many steps does it take? What happens at each step? For example, something like:
The first step is to match the letter "a" from the regex against the "a" in the text "abc". Because this is not optional/repeated there is no backtrack stored at this position.
Ideally, the regular expression (if it is a true regular expression) is first converted to a graph representation of an NFA (non-deterministic finite automaton), perhaps something like this:
a(bc)*:
(0)-- a --> (1) ---b--> (2) -- ε --> ((3))
^ |
`-----c----'
0 is the start state; ((3)) is the acceptance state. ε is an empty transition without consuming input.
An NFA can be executed directly by the NFA simulation algorithm.
It can also be compiled to a DFA (deterministic F. A.) using the "subset construction". The states of the DFA correspond to sets of the original NFA states. We end up with something like this:
DFA state NFA States Input Next State
--------------------------------------------
0 { 0 } a 1
1 { 1 } b 2
2 (accept) { 2, 3 } c 1
State 2 of the DFA corresponds to two states of the NFA: when the DFA is instate 2, the corresponding NFA simulator has to be in states 2 and 3 simultaneously, because 3 is reachable via an epsilon transition (no input symbol consumed). The DFA state 2 is an acceptance state because the NFA set it corresponds to { 2, 3 } contains an acceptance state.
The DFA requires very few steps; basically we just read characters and dispatch to the next state in the table based on the current state and the input character. If we are not able to dispatch, then there is a mismatch; we can stop reading more input. If we process the entire input, and are left in an acceptance state, then there is a match.

Regular Expression to DFA

Can someone tell me if the attached DFA is correct ?
I am suppose to give DFA for the language that has alphabet Σ ={a, b}
I need DFA for this ----> A={ε, b, ab}
No, for multiple reasons:
Your automaton bab
Your automaton does not accept ab
Your automaton is not a DFA, at least by some strict definitions
Regarding the first point: starting at q1, we see b, go to q2, see a, go to q3, see b, and go to q4, which is accepting. We saw bab and accepted it.
Regarding the second point: starting at q1, we see a but have no defined transition. The automaton "crashes" and fails to accept. So no string starting with a is accepted, including ab.
Regarding the third point: DFAs are often required to show all states and transitions, including dead states and transitions that will never lead back to any accepting state. You don't show all transitions and don't show all states in your automaton.
You can use the Myhill-Nerode theorem to determine how many states a minimal DFA for your language has. We note that the empty state can have appended either the empty string, b or ab to get a string in the language; a can have b appended; and b can have the empty string appended. Nothing can be appended to aa, bb, or ba to get a string in the language (so these are indistinguishable); but ab can have the empty string appended (and so is indistinguishable from b).
Equivalence classes so determined correspond to states in a minimal DFA. Our equivalence classes are:
Strings like the empty string
Strings like b
Strings like a
Strings like aa
We note that b is in the language, so the second class will correspond to an accepting state. We notice nothing can be appended to aa to get a string in the language, so this class corresponds to a dead state in the DFA. We write the transitions between these states by seeing which new equivalence class the appending of a new symbol puts us in:
Appending a puts us in (3) since appending a to the empty string gives a which is in (3). Appending b puts us in (2) since appending b to the empty string gives b which is in (2)
Appending a puts us in (4) since appending a to to b gives ba which is like aa in that it isn't a prefix of any string in the language. Appending b, we arrive in (4) by a similar argument.
Appending a we get aa and are in (4). Appending b we get ab which is like b so we are in (2).
All transitions from a dead state return to a dead state; both a and b lead back to (4).
You end up with something like:
q1 --a--> q3
| /|
b --b--< a
| / |
vv v
q2 -a,b-> q4 \
^ a,b
\_/
Or in tabular form:
q s q'
== = ==
q1 a q3
q1 b q2
q2 a q4
q2 b q4
q3 a q4
q3 b q2
q4 a q4
q4 b q4
i think this DFA is correct for that language.
Your attached D.F.A is wrong..
your D.F.A is acceptable only for €,b,bab but it cannot accept ab.
To make your dfa to accept ab also add a new state to q0 which accepts a and whenever newstate gets input as b send it to a final state.
As it is a d.f.a the inputs which are not required for u send it to a new state (DEAD STATE)
The d.f.a for your question is here:
click here to view the d.f.a

NFA to an RE Kleene's Theorem

Here is my NFA:
Here is my attempt.
Create new start and final nodes
Next eliminate the 2nd node from the left which gives me ab
Next eliminate the 2nd node from the right which gives me ab*a
Next eliminate the 2nd node from the left which gives me abb*b
Next eliminate the 2nd node from the right which gives me b+ab*a
Which leads to abbb (b+aba)*
Is this the correct answer?
No you are not correct :(
you not need to create start state. the first state with - sign is the start state. Also a,b label means a or b but not ab
there is a theorem called Arden's theoram, will be quit helpful to convert NFA into RE
What is Regular Expression for this NFA?
In you NFA the intial part of DFA:
step-1:
(-) --a,b-->(1)
means (a+b)
step-2: next from stat 1 to 2, note state 2 is accepting state final (having + sign).
(1) --b--->(2+)
So you need (a+b)b to reach to final state.
step-3: One you are at final state 2, any number of b are accepted (any number means one or more). This is because of self loop on state 2 with label b.
So, b* accepted on state-2.
step-4:
Actually there is two loops on state-2.
one is self loop with label b as I described in step-3. Its expression is b*
second loop on state-2 is via state-3.
the expression for second loop on state-2 is aa*b
why expression aa*b ?
because:
a-
|| ====> aa*b
▼|
(2+)--a-->(3) --b-->(2+)
So, In step-3 and step-4 because of loop on state-2 run can be looped back via b labeled or via aa*b ===> (b + aa*b)*
So regular expression for your NFA is:
(a+b) b (b + aa*b)*

simulate a deterministic pushdown automaton (PDA) in c++

I was reading an exercise of UVA, which I need to simulate a deterministic pushdown automaton, to see
if certain strings are accepted or not by PDA on a given entry in the following format:
The first line of input will be an integer C, which indicates the number of test cases. The first line of each test case contains five integers E, T, F, S and C, where E represents the number of states in the automaton, T the number of transitions, F represents the number of final states, S the initial state and C the number of test strings respectively. The next line will contain F integers, which represent the final states of the automaton. Then come T lines, each with 2 integers I and J and 3 strings, L, T and A, where I and J (0 ≤ I, J < E) represent the state of origin and destination of a transition state respectively. L represents the character read from the tape into the transition, T represents the symbol found at the top of the stack and A the action to perform with the top of the stack at the end of this transition (the character used to represent the bottom of the pile is always Z. to represent the end of the string, or unstack the action of not taking into account the top of the stack for the transition character is used <alt+156> £). The alphabet of the stack will be capital letters. For chain A, the symbols are stacked from right to left (in the same way that the program JFlap, ie, the new top of the stack will be the character that is to the left). Then come C lines, each with an input string. The input strings may contain lowercase letters and numbers (not necessarily present in any transition).
The output in the first line of each test case must display the following string "Case G:", where G represents the number of test case (starting at 1). Then C lines on which to print the word "OK" if the automaton accepts the string or "Reject" otherwise.
For example:
Input:
2
3 5 1 0 5
2
0 0 1 Z XZ
0 0 1 X XX
0 1 0 X X
1 1 1 X £
1 2 £ Z Z
111101111
110111
011111
1010101
11011
4 6 1 0 5
3
1 2 b A £
0 0 a Z AZ
0 1 a A AAA
1 0 a A AA
2 3 £ Z Z
2 2 b A £
aabbb
aaaabbbbbb
c1bbb
abbb
aaaaaabbbbbbbbb
this is the output:
Output:
Case 1:
Accepted
Rejected
Rejected
Rejected
Accepted
Case 2:
Accepted
Accepted
Rejected
Rejected
Accepted
I need some help, or any idea how I can simulate this PDA, I am not asking me a code that solves the problem because I want to make my own code (The idea is to learn right??), But I need some help (Some idea or pseudocode) to begin implementation.
You first need a data structure to keep transitions. You can use a vector with a transition struct that contains transition quintuples. But you can use fact that states are integer and create a vector which keeps at index 0, transitions from state 0; at index 1 transitions from state 1 like that. This way you can reduce searching time for finding correct transition.
You can easily use the stack in stl library for the stack. You also need search function it could chnage depending on your implementation if you use first method you can use a function which is like:
int findIndex(vector<quintuple> v)//which finds the index of correct transition otherwise returns -1
then use the return value to get newstate and newstack symbol.
Or you can use a for loop over the vector and bool flag which represents transition is found or not.
On second method you can use a function which takes references to new state and new stack symbol and set them if you find a appropriate transition.
For inputs you can use something like vector or vector depends on personal taste. You can implement your main method with for loops but if you want extra difficulties you can implement a recursive function. May it be easy.