RE to NFA Thompson's construction steps ((c|a)b*)* - regex

I tried to convert ((c|a)b*)* to an nfa by using thompsom's construction but I have understood something wrong because the outcome isn't the one it is supposed to be. I would be really glad if you could point my mistake.
Thompson's construction rules:
1)Every NFA has a start state and an accepting state.
2)No transition, except the starting one, is allowed to enter the start state.
3)No transition exits from an accepting state.
4)An ε-transition always connects 2 states that used to be start or accepting states for some REs
5)A state can have at maximum 2 incoming and 2 exiting ε-transitions
6)A state can at maximum 1 incoming and 1 exiting transition for a specific character of the alphanumerics used.
Step 1: I created NFAs for each character
Step 2: parenthesis have priority so I created c|a
Step 3: then I created b*
Step 4: then I combined c|a and b* to create (c|a)b*
Step 5: and at last I created ((c|a)b*)*
The difference from the correct solution is that in the last nfa (the example doesn't show the steps and the states got renumbered in the end) there is no s9. So S8 ε-transists to S5 and S5 ε-transists to S10. Which makes sense to me if b* didn't have the S9 state but it needs it because of rule number 2. So I guess I made a mistake during the connection. Thank you in advance.

Rule 2 says that nothing can enter S11, which isn't relevant here. When concatenating (step 4), S8 and S9 should have been combined.
From Wikipedia,
The concatenation expression st is converted to

Related

Formulation of language and regular expressions

I can't figure out what is the formal language and regular expression
of this automaton :
DFA automaton
I know that the instance of 'b' or 'a' have to be even.
At first I thought the language was:
L = {(a^i)(b^j) | i(mod2) = j(mod2) = 0, i,j>=0}
But the automaton can start from 'b', so the language is incorrect.
also, the regular expression i found, isn't match either ((aa)* + (bb)) -
can't get abab for example.
The regex I got by progressively ripping out nodes (order: 3,1,2,0) is:
(aa|bb|(ab|ba)(bb|aa)*(ab|ba))*
As far as I can tell, that's the simplest it goes. (I'd love to know if anyone has a simpler reduction—I'm actually taking a test on this stuff this week!)
Step-by-step process
We start off by adding a new start and accept state. Every old accept state (in this case, there's only one) gets linked to the new accept state with an ε transition:
Next, we rip out state 3. We need to preserve all paths that run through state 3. In this case we've added a path from state 0 back to itself, paths from state 0 to state 2, and state 2 back to itself:
We do the same with state 1:
We can simplify this a bit: we'll concatenate the looping-back transitions with commas. (At the end, this will turn into the union operator (| or ⋃ etc. depending on your notation.)
We'll remove state 2 next, and get everything smooshed onto one big loop:
Loops become stars; we remove the last state so we just have a transition from the start state to the end state connected with one big regular expression:
And that's our regular expression!
Language definition
You're pretty close with the language definition. If you can allow something a little looser, it would be this:
L = { w | w contains an even number of 'a's and 'b's }
The problem with your definition is that you start the string w off with a every time, whereas the only restriction is on the parity of the number of a's and b's.

Converting DFA to RE

I am using JFLAP to convert a DFA to RE for the language
"Even a and Odd b"
This last step is not clear to me as shown in figure how it get this final RE
final RE
((ab(bb)*ba+aa)*(ab(bb)*a+b)(a(bb)*a)*(a(bb)*ba+b))*(ab(bb)*ba+aa)*(ab(bb)*a+b)(a(bb)*a)*
My confusion is at the term a(bb)*ba+b (Q1 TO Q0), why it has star in final expression
I have relabeled the transitions of the NFA in your diagram so the explanation is simpler.
Doing so leaves you with the regex:
(R1* R2 R3* R4)* R1* R2 R3*
The first parenthesized section is essentially describing a sequence of steps that get you from q0 back to q0. The regex says, do this as much as you want and when you're done messing around, you can follow R1 as many times as you want to still stay in state q0 and when you're really done messing around, follow R2 to get to the final state where you can loop on R3 as much as you want.
This not the neatest or most intuitive way to have state eliminated the NFA into a regex but I think it is correct. Hopefully the explanation makes sense. If not, ask in the comments!
As a reference, I've written the regex I came up with. Note I use | instead of + as you have.
(aa|ab(bb)*ba)* (ab(bb)*a|b) ((a(bb)*a)* ((a(bb)*ba|b)(aa|ab(bb)*ba)*(ab(bb)*a|b))*)*
Edit:
You want your regex to capture all possible patterns that will eventually lead you to the final state, starting from state q0. Now imagine you are standing at state q0. What actions could you make? You could split up your set of actions into those that will keep you in state q0 and those that will get you in q1.
Actions that will keep you in q0:
Follow R1
Follow R2, do whatever messing around you can in q1, and then come back to q0 by following R4. Let us call this regex R2_R4 where that blank needs to be filled with all possible things we can do in q1 except coming back via R4. Well the only thing in q1 we can do is follow R3 a bunch of times so we replace the blank with R2R3*R4.
By enumerating all the ways you can stay in q0, you're essentially getting rid of the transition from q1 to q0 (R4). In other words, you're saying after this portion of my regex, if you go to state q1, there should be no way of coming back to q0 (if there were it would be captured by the first part of the regex). So your NFA now kind of looks like this:
So you final regex would be, follow the transition that stays in q0, then go to q1 via R2, and stay in q2 as long as you want by following R3. So your regex could look like:
(R1 + R2R3*R4)* R1* R2 R3*
which actually is equivalent to the one you have:
(R1* R2 R3* R4)* R1* R2 R3*
because the or nature of (R1+R2 R3* R4)* is equivalent to (R1* R2 R3* R4)*. I actually think the version with the or (+) is clearer but it doesn't really matter as long as it works.

Converting Epsilon-NFA to NFA

I'm having trouble understanding the process of converting an epsilon-NFA to a NFA, so I wondered if anybody could help me with it:
And the answer says:
The 0 in the new NFA has an A going to 1,2 and to 2. I figured this is because the 0 in the Epsilon NFA leads to 1 and 2 with an A (combined with an Epsilon). So why doesn't the 1,2 have an A-step going to 2, because in the Epsilon NFA the 1 has an A-step to 1 and 2?
Whenever you remove an ε from the NFA, you should be careful at the time of conversion for the direction of ε transition.
In your case, the ε transition is from node 1 to node 2, which is an
accept state. So, you need to consider all the incoming transitions to
the state 1.
Also, as {1} moves to {2} upon ε-transition, so 1 can also be reduced to {1,2} and it'll be an accept state. Check this question to know why this happens.
So, for removal of ε-transition, check all the incoming transitions to state 1, replace {1} with accept state {1,2} and convert them :-
State 0 transits to state 1 when it reads a, and state 1 will automatically transit to state 2 as it reads ε.
So, you should omit this path from 1 to 2(of ε-transition), and say that state 0 on reading a transits to both {1} and {2}. So, only 1 transition will be added to the exisitng NFA as
{0} -> {2} (on reading a) // should be drawn, not given
{0} -> {1} (on reading a) // this is already given
State 2 transits to state 1 when it reads a, and state 1 will automatically transit to state 2 as it reads ε.
So, you should omit this path from 1 to 2(of ε-transition), and say that state 2 on reading a transits to both {1} and {2}, itself. So, only 1 transition will be added to the exisitng NFA as
{2} -> {2} (on reading a) // a self-loop, should be drawn, not given
{2} -> {1} (on reading a) // this is already given
Please take special care that you replace the state {1} with the
accept state {1,2} because of the reason explained above.
There are no more incoming arrows directed to state 1 and hence all the dependencies are resolved. The new NFA matches your given NFA as the answer.

Steps to draw a DFA (or NFA) from a simple statement?

I am given a simple statement: Construct a DFA over alphabet {0, 1} that accepts all the strings that end in 101?
My question is that what will be the steps to design it? Or design an NFA, because then I know the clear steps yo convert an NFA to a DFA, so I will then convert the NFA to the DFA.
Note:- It is just a minor course for me, so I have never studied anything like regular expressions, or any algorithms probably used to construct DFA's.
If you want more of an explanation on how I derived this, I'd be happy to explain, but for now I just drew the DFA and explained each state.
Sorry about the screenshot...I didn't know how to convert it straight to an image.
On input 0 at state 0, it loops back to itself. On 1, it prepares
itself to end because it could possibly be '101'.
q1 loops to itself on input 1 because it's still preparing to end on
'101'. Input '0' on q1 means it is preparing for input '10', so it goes to q2.
Input '0' on q2 breaks the whole cycle and goes back to q0. Input '1'
results in moving to q3, the accepting state.
Any input on q3 results in going back to whatever point in the cycle
the input corresponds with.
That is, on '1' it goes back to q1, or the state where the first '1'
was encountered in '101', preparing to end.
On '0', it goes to q2 because in order to get to q3, there must have
been an input of '1' from q2, so no matter what, the last two input
symbols are '10' now.
TikZ DFA examples.
Here,the string should end with 101.So we need to draw nfa for it and later convert it into DFA
Here the total states are A,B,C,D.
I will upload an image here. In that I have drawn NFA and then I have drawn transition table for it.
And then I have drawn transition table for conversion of NFA to DFA.
I also drawn DFA for your sake.
In NFA, when a specific input is given to the current state, the machine goes to multiple states. It can have zero, one or more than one move on a given input symbol. On the other hand, in DFA, when a specific input is given to the current state, the machine goes to only one state. DFA has only one move on a given input State.
THE STEPS FOR CONVERTING NFA TO DFA:
Step 1: Initially Q' = ϕ
Step 2: Add q0 of NFA to Q'. Then find the transitions from this start state.
Step 3: In Q', find the possible set of states for each input symbol. If this set of states is not in Q', then add it to Q'.
Step 4: In DFA, the final state will be all the states which contain F(final states of NFA)
View the image here
Click here

Avoiding Comments w/ C++ getline()

I'm using getline() to open a .cpp file.
getline(theFile, fileData);
I'm wondering if there is any way to have getline() avoid grabbing c++ comments (/*, */ and //)?
So far, trying something like this doesn't quite work.
if (fileData[i] == '/*')
I think it's unavoidable for you to read the comments, but you can dispose of them by reading through the file one character at a time.
To do this, you can load the file into a string and build a state machine with the following states:
This is actual code
The previous character was /
The previous character was *
I am a single-line comment
I am a multi-line comment
The state machine starts in State 1
If the machine is in State 1 and hits a / character, transition to State 2.
If the machine is in State 2 and hits a / character, transition to State 4. Otherwise, transition to State 1.
If the machine is in State 2 and hits a * character, transition to State 5. Otherwise, transition to State 1.
If the machine is in State 4 and hits a newline character, transition to State 1.
If the machine is in State 5 and hits a * character, transition to State 3.
If the machine is in State 3 and hits a / character, transition to State 1 (the multi-line comment ends). Otherwise, transition to State 5.
If you mark the positions of the characters where the machine enters and exits the comment states, you can then strip these characters from the string.
Alternatively, you could explore regular expressions, which provide ways of describing this kind of state machine very succinctly.
So, one problem is that if(fileData[i] == '/*') is testing if the char fileData[i] is equal to '/*' which is... Not a char.
To find if a line contains a comment, you will probably want to look into one of the following:
<regex> in C++11 (Boost has a regular expression library as well, if that's more your thing.)
strstr in vanilla C/C++.
For multi-line comments, you'll probably want to store something like store a flag indicating whether the state of the previous line was "in comment" or not, and then search for /* or */ according to that flag, updating it as you go.
Single quotation marks designate a char, and the char data type represent a SINGLE char.'/*' doesn't make sense, because it's two char while fileData[i] refers to a single char.
Your if statement needs to be far more robust.