I am using JFLAP to convert a DFA to RE for the language
"Even a and Odd b"
This last step is not clear to me as shown in figure how it get this final RE
final RE
((ab(bb)*ba+aa)*(ab(bb)*a+b)(a(bb)*a)*(a(bb)*ba+b))*(ab(bb)*ba+aa)*(ab(bb)*a+b)(a(bb)*a)*
My confusion is at the term a(bb)*ba+b (Q1 TO Q0), why it has star in final expression
I have relabeled the transitions of the NFA in your diagram so the explanation is simpler.
Doing so leaves you with the regex:
(R1* R2 R3* R4)* R1* R2 R3*
The first parenthesized section is essentially describing a sequence of steps that get you from q0 back to q0. The regex says, do this as much as you want and when you're done messing around, you can follow R1 as many times as you want to still stay in state q0 and when you're really done messing around, follow R2 to get to the final state where you can loop on R3 as much as you want.
This not the neatest or most intuitive way to have state eliminated the NFA into a regex but I think it is correct. Hopefully the explanation makes sense. If not, ask in the comments!
As a reference, I've written the regex I came up with. Note I use | instead of + as you have.
(aa|ab(bb)*ba)* (ab(bb)*a|b) ((a(bb)*a)* ((a(bb)*ba|b)(aa|ab(bb)*ba)*(ab(bb)*a|b))*)*
Edit:
You want your regex to capture all possible patterns that will eventually lead you to the final state, starting from state q0. Now imagine you are standing at state q0. What actions could you make? You could split up your set of actions into those that will keep you in state q0 and those that will get you in q1.
Actions that will keep you in q0:
Follow R1
Follow R2, do whatever messing around you can in q1, and then come back to q0 by following R4. Let us call this regex R2_R4 where that blank needs to be filled with all possible things we can do in q1 except coming back via R4. Well the only thing in q1 we can do is follow R3 a bunch of times so we replace the blank with R2R3*R4.
By enumerating all the ways you can stay in q0, you're essentially getting rid of the transition from q1 to q0 (R4). In other words, you're saying after this portion of my regex, if you go to state q1, there should be no way of coming back to q0 (if there were it would be captured by the first part of the regex). So your NFA now kind of looks like this:
So you final regex would be, follow the transition that stays in q0, then go to q1 via R2, and stay in q2 as long as you want by following R3. So your regex could look like:
(R1 + R2R3*R4)* R1* R2 R3*
which actually is equivalent to the one you have:
(R1* R2 R3* R4)* R1* R2 R3*
because the or nature of (R1+R2 R3* R4)* is equivalent to (R1* R2 R3* R4)*. I actually think the version with the or (+) is clearer but it doesn't really matter as long as it works.
Related
I can't figure out what is the formal language and regular expression
of this automaton :
DFA automaton
I know that the instance of 'b' or 'a' have to be even.
At first I thought the language was:
L = {(a^i)(b^j) | i(mod2) = j(mod2) = 0, i,j>=0}
But the automaton can start from 'b', so the language is incorrect.
also, the regular expression i found, isn't match either ((aa)* + (bb)) -
can't get abab for example.
The regex I got by progressively ripping out nodes (order: 3,1,2,0) is:
(aa|bb|(ab|ba)(bb|aa)*(ab|ba))*
As far as I can tell, that's the simplest it goes. (I'd love to know if anyone has a simpler reduction—I'm actually taking a test on this stuff this week!)
Step-by-step process
We start off by adding a new start and accept state. Every old accept state (in this case, there's only one) gets linked to the new accept state with an ε transition:
Next, we rip out state 3. We need to preserve all paths that run through state 3. In this case we've added a path from state 0 back to itself, paths from state 0 to state 2, and state 2 back to itself:
We do the same with state 1:
We can simplify this a bit: we'll concatenate the looping-back transitions with commas. (At the end, this will turn into the union operator (| or ⋃ etc. depending on your notation.)
We'll remove state 2 next, and get everything smooshed onto one big loop:
Loops become stars; we remove the last state so we just have a transition from the start state to the end state connected with one big regular expression:
And that's our regular expression!
Language definition
You're pretty close with the language definition. If you can allow something a little looser, it would be this:
L = { w | w contains an even number of 'a's and 'b's }
The problem with your definition is that you start the string w off with a every time, whereas the only restriction is on the parity of the number of a's and b's.
I would very much appreciate any help to get the following thing done in notepad ++.
I have more than 40.000 lines like the ones under. All of them are English language tests. They look like this:
Q1 "I know that you don't like seafood but our friends make the best seafood Fettuccini Alfredo I have ever had.
Will you agree to keep an ......... mind and try it before deciding you don't like it?" Jill asked her son.
(a) open (b) airy (c) indifferent (d) ignorant
Q2 The boss was a scary guy. When he called you into his office, you could bet that you would receive the worse
insults you have ever had to endure and there was something about him that would stop anyone from talking
back to him. People immediately froze in their ......... and meekly walked into his office when he called them.
(a) paths (b) tracks (c) cars (d) shoes
Q3 In ......... to change the sink, he would have to turn off the water that runs to the facet. He failed to do so and got
a surprise when water started liberally spraying down on the kitchen floor.
(a) ability (b) possibility (c) plausibility (d) order
Q4 Since the company put a set of sexual harassment rules in ......... incidents of sexual harassment were virtually
non-existent.
(a) ordering (b) place (c) storage (d) foundation
Q5 "Your shed is in pretty poor ......... . The back of the foundation is sinking and there is water getting into it from
the roof. I can't help you with the foundation but we can look for ways to seal it," Rob said to Christian.
(a) mass (b) density (c) support (d) shape
As you can see the questions are not in one line but they are broken with an enter plus an empty line plus and some empty space characters.
I would like to achieve something like this:
Q1 "I know that you don't like seafood but our friends make the best seafood Fettuccini Alfredo I have ever had. Will you agree to keep an ......... mind and try it before deciding you don't like it?" Jill asked her son.
(a) open (b) airy (c) indifferent (d) ignorant
Q2 The boss was a scary guy. When he called you into his office, you could bet that you would receive the worse insults you have ever had to endure and there was something about him that would stop anyone from talking back to him. People immediately froze in their ......... and meekly walked into his office when he called them.
(a) paths (b) tracks (c) cars (d) shoes
Q3 In ......... to change the sink, he would have to turn off the water that runs to the facet. He failed to do so and got a surprise when water started liberally spraying down on the kitchen floor.
(a) ability (b) possibility (c) plausibility (d) order
Q4 Since the company put a set of sexual harassment rules in ......... incidents of sexual harassment were virtually non-existent.
(a) ordering (b) place (c) storage (d) foundation
Q5 "Your shed is in pretty poor ......... . The back of the foundation is sinking and there is water getting into it from the roof. I can't help you with the foundation but we can look for ways to seal it," Rob said to Christian.
(a) mass (b) density (c) support (d) shape
So I need all the questions in one line, one long line as long as the question until it ends. The question options must be under them as they are, I think I do not need to change the question options (a, b, c, d) only the questions.
Manually, I would have to go line by line and delete the characters until the questions are one line each. With tens of thousands of questions, it would be a difficult thing to do. Is there a way that it could be done in Notepad ++ with regex?
If it helps, each and every question starts with Q1, Q2, Q3 and so on up until Q10. All the lines that start with (a) are question options.
Two approaches:
Based on the fact, that the start of the lines you want attach is always indented, you can use
\R++\h++([^(])
and replace with $1.
Or based on the fact that you don't want to merge lines starting with an opening bracket or Q number, you can use
\R++\h*+((?!Q\d)[^(])
and again replace with $1.
You can use the following regex:
Q(.*)(\r?\n)+\h+(\w)
and replacement:
Q\1 \3
or
Q(.+)\v+\h+(\w)
and replacement:
Q\1 \2
Click on Replace All a couple of times and it will be done.
EXPLANATIONS:
Q(.+)\v+\h+(\w) will select all lines starting with Q followed by one or several characters and ending by one or several EOL/Carriage return char themselves followed by several horizontal space characters then followed by a word char to avoid taking the answers into account.
Then you replace the whole thing by Q\1 \2: the Q the first line of the question a space and the second line of the question (by using backreferences)
You need to click several times on replace all until no occurence is replaced.
Let me know if anything is unclear.
TESTED:
Try this with Find+Replace:
(\n\s+)(\s\w)
Replace with:
$2
Visual Studio / XPath / RegEx:
Given Expression:
(?<TheObject>(Car|Car Blue)) +(?<OldState>.+) +---> +(?<NewState>.+)
Given Searched String:
Car Blue Flying ---> Crashed
I expected:
TheObject = "Car Blue"
OldState = "Flying"
NewState = "Crashed"
What I get:
TheObject = "Car"
OldState = "Blue Flying"
NewState = "Crashed"
Given new RegEx:
(?<TheObject>(Car Blue|Car)) +(?<OldState>.+) +---> +(?<NewState>.+)
Result is (what I want):
TheObject = "Car Blue"
OldState = "Flying"
NewState = "Crashed"
I conceptually get what's happening under the hood; the RegEx is putting the first (left-to-right) match it finds in the OR'd list into the <TheObject> group and then goes on.
The OR'd list is built at run time and cannot guarantee the order that "Car" or "Car Blue" is added to the OR'd list in <TheObject> group. (This is dramatically simplified OR'd list)
I could brute force it, by sorting the OR'd list from longest to shortest, but, I was looking for something a little more elegant.
Is there a way to make <TheObject> group capture the largest it can find in the OR'd list instead of the first it finds? (Without me having to worry about the order)
Thank you,
I would normally automatically agree with an answer like ltux's, but not in this case.
You say the alternation group is generated dynamically. How frequently is it generated dynamically? If it's every user request, it's probably faster to do a quick sort (either by longest length first, or reverse-alphabetically) on the object the expression is built from than to write something that turns (Car|Car Red|Car Blue) into (Car( Red| Blue)?).
The regex may take a bit longer (you probably won't even notice a difference in the speed of the regex) but the assembly operation may be much faster (depending on the architecture of the source of your data for the alternation list).
In simple test of an alternation with 702 options, in three methods, results are comparable using an option set like this, but none of these results are taking into calculation the amount of time to build the string, which grows as the complexity of the string grows.
The options are all the same, just in different formats
zap
zap
yes
xerox
...
apple
yes
zap
yes
xerox
...
apple
xerox
zap
yes
xerox
...
apple
...
apple
zap
yes
xerox
...
apple
Using Google Chrome and Javascript, I tried three (edit: four) different formats and saw consistent results for all between 0-2ms.
'Optimized factoring' a(?:4|3|2|1)?
Reverse alphabetically sorting (?:a4|a3|a2|a1|a)
Factoring a(?:4)?|a(?:3)?|a(?:2)?|a(?:1)?. All are consistently coming in at 0 to 2ms (the difference being what else my machine might be doing at the moment, I suppose).
Update: I found a way that you may be able to do this without sorting in Regular Expressions, using a lookahead like this (?=a|a1|a2|a3|a4|a5)(.{15}|.(14}|.{13}|...|.{2}|.) where 15 is the upper bound counting all the way down to the lower bound.
Without some restraints on this method, I feel like it can lead to a lot of problems and false positives. It would be my least preferred result. If the lookahead matches, the capture group (.{15}|...) will capture more than you'll desire on any occasion where it can. In other words, it will reach ahead past the match.
Though I made up the term Optimized Factoring in comparison to my Factoring example, I can't recommend my Factoring example syntax for any reason. Sorted would be the most logical, coupled with easier to read/maintain than exploiting a lookahead.
You haven't given much insight into your data but you may still need to sort the sub groups or factor further if the sub-options can contain spaces and may overlap, further diminishing the value of "Optimized Factoring".
Edit: To be clear, I am providing a thorough examination as to why no form of factoring is a gain here. At least not in any way that I can see. A simple Array.Sort().Reverse().Join("|") gives exactly what anyone in this situation would need.
The | operator of regular expression usually uses Aho–Corasick algorithm under the hood. It will always stop at the left most match it found. We can't change the behaviour of | operator.
So the solution is to avoid using | operator. Instead of (Car Blue|Car) or (Car|Car Blue), use (Car( Blue)?).
(?<TheObject>(Car( Blue)?) +(?<OldState>.+) +---> +(?<NewState>.+)
Then the <TheObject> group will always be Car Blue in the presence of Blue.
I tried to convert ((c|a)b*)* to an nfa by using thompsom's construction but I have understood something wrong because the outcome isn't the one it is supposed to be. I would be really glad if you could point my mistake.
Thompson's construction rules:
1)Every NFA has a start state and an accepting state.
2)No transition, except the starting one, is allowed to enter the start state.
3)No transition exits from an accepting state.
4)An ε-transition always connects 2 states that used to be start or accepting states for some REs
5)A state can have at maximum 2 incoming and 2 exiting ε-transitions
6)A state can at maximum 1 incoming and 1 exiting transition for a specific character of the alphanumerics used.
Step 1: I created NFAs for each character
Step 2: parenthesis have priority so I created c|a
Step 3: then I created b*
Step 4: then I combined c|a and b* to create (c|a)b*
Step 5: and at last I created ((c|a)b*)*
The difference from the correct solution is that in the last nfa (the example doesn't show the steps and the states got renumbered in the end) there is no s9. So S8 ε-transists to S5 and S5 ε-transists to S10. Which makes sense to me if b* didn't have the S9 state but it needs it because of rule number 2. So I guess I made a mistake during the connection. Thank you in advance.
Rule 2 says that nothing can enter S11, which isn't relevant here. When concatenating (step 4), S8 and S9 should have been combined.
From Wikipedia,
The concatenation expression st is converted to
I am given a simple statement: Construct a DFA over alphabet {0, 1} that accepts all the strings that end in 101?
My question is that what will be the steps to design it? Or design an NFA, because then I know the clear steps yo convert an NFA to a DFA, so I will then convert the NFA to the DFA.
Note:- It is just a minor course for me, so I have never studied anything like regular expressions, or any algorithms probably used to construct DFA's.
If you want more of an explanation on how I derived this, I'd be happy to explain, but for now I just drew the DFA and explained each state.
Sorry about the screenshot...I didn't know how to convert it straight to an image.
On input 0 at state 0, it loops back to itself. On 1, it prepares
itself to end because it could possibly be '101'.
q1 loops to itself on input 1 because it's still preparing to end on
'101'. Input '0' on q1 means it is preparing for input '10', so it goes to q2.
Input '0' on q2 breaks the whole cycle and goes back to q0. Input '1'
results in moving to q3, the accepting state.
Any input on q3 results in going back to whatever point in the cycle
the input corresponds with.
That is, on '1' it goes back to q1, or the state where the first '1'
was encountered in '101', preparing to end.
On '0', it goes to q2 because in order to get to q3, there must have
been an input of '1' from q2, so no matter what, the last two input
symbols are '10' now.
TikZ DFA examples.
Here,the string should end with 101.So we need to draw nfa for it and later convert it into DFA
Here the total states are A,B,C,D.
I will upload an image here. In that I have drawn NFA and then I have drawn transition table for it.
And then I have drawn transition table for conversion of NFA to DFA.
I also drawn DFA for your sake.
In NFA, when a specific input is given to the current state, the machine goes to multiple states. It can have zero, one or more than one move on a given input symbol. On the other hand, in DFA, when a specific input is given to the current state, the machine goes to only one state. DFA has only one move on a given input State.
THE STEPS FOR CONVERTING NFA TO DFA:
Step 1: Initially Q' = ϕ
Step 2: Add q0 of NFA to Q'. Then find the transitions from this start state.
Step 3: In Q', find the possible set of states for each input symbol. If this set of states is not in Q', then add it to Q'.
Step 4: In DFA, the final state will be all the states which contain F(final states of NFA)
View the image here
Click here