The following example shows a simple DFA with one accepting state q2 :
Based on the R(i,j,k) Algorithm shown above i want to convert this DFA to regular expression, unfortunately i can't find a good definition of the K. My question is what does the K mean ?
is it the number of states (in this case 3) or something else?
Then, we solve those equations to get the equation for qi in terms of α ij and that expression is the required solution, where qi is a final state. It is shown below −
q 1= q a + q3a + € ( € move is because q1is the initial state)
q2 = q1b + q2b + q3b
q3 = q2a
Now, we will solve these equations, as shown below−
. q2 = q1b + q2b + q3b
= q1b + q2b + (q2a)b (Substituting value of q3)
= q1b + q2(b + ab)
= q1b (b + ab) * (Applying Arden's Theorem)
. q1 = q1a + q3a + €
= q1a + q2aa + € (Substituting value of q3)
= q1a + q1b(b + ab*)aa + € (Substituting value of q2)
= q1(a + b(b + ab) *aa) + €
= € (a+ b(b + ab) *aa) *
= (a + b(b + ab) *aa) *
Hence, the regular expression is (a + b(b + ab) *aa) *.
Related
I am struggling to grasp the concept of describing a regular expression for a 3 alphabet language where the only restriction is on all three characters appearing in one specific order.
eg1 :
Write a regular expression over \S = {a, b, c} which ensures that w is such that there is no occurrence of the string "abc"
Valid matches include {bac, baac, cad, aaaaccb, abababac} the only word that should not be produced is "abc"
eg2 :
Write a regular expression over \S = {x, y, z} which ensures that w is such that there is no occurrence of the string "zyx"
Valid matches include {xyz, xyyx, yzx, zzyyxc, xyxyxzyzy} the only word that should not be produced is "zyx"
We can make a DFA first and then run the algorithm to make the regular expression.
A DFA for the first is:
q s q'
----------------
q0 b,c q0
q0 a q1
q1 a q1
q1 b q2
q1 c q0
q2 a q1
q2 b q0
q2 c q3
q3 a,b,c q3
We can write some equations...
q0 = q0(b + c) + q1c + q2b
q1 = q0a + q1a + q2a
q2 = q1b
We can replace all instances of q2 with q1b ...
q0 = q0(b + c) + q1(c + bb)
q1 = q0a + q1(a + ba)
q2 = q1b
We can remove the recursion in the equation for q1:
q0 = q0(b + c) + q1(c + bb) + e
q1 = q0a(a + ba)*
q2 = q1b
Now replace the expression for q1 ...
q0 = q0((b + c) + a(a + ba)*(c + bb)) + e
q1 = q0a(a + ba)*
q2 = q1b
Removing the recursion from the equation for q0 gives:
q0 = ((b + c) + a(a + ba)*(c + bb))*
q1 = q0a(a + ba)*
q2 = q1b
Filling in the others gives
q0 = ((b + c) + a(a + ba)*(c + bb))*
q1 = ((b + c) + a(a + ba)*(c + bb))*a(a + ba)*
q2 = ((b + c) + a(a + ba)*(c + bb))*a(a + ba)*b
Since these three states are accepting, our RE should be the union:
r = ((b + c) + a(a + ba)*(c + bb))*(e + a(a + ba)*(e + b))
Your other example is quite similar.
I have expression with exponential functions and I would like to simplify them. Some of the exponential functions have a negative parameters.
To sum up my problem with a very trivial example :
a,b,c,d=sp.symbols("a b c d",real=True, positive=True)
myExpr=(sp.exp(c)+sp.exp(d))*sp.exp(-c-d)
myExpr.simplify() gave a simplified expression. That is perfect.
BUT, with a denominator in the expression, exponential functions with a negative parameter are not simplified :
a,b,c,d=sp.symbols("a b c d",real=True, positive=True)
myExpr=(sp.exp(c)+sp.exp(d))*sp.exp(-c-d)/a
How can I simplify it ?
As suggested in the comments, I submit a more complicated example :
import sympy as sp
a, b, c, d, e, f, c1, c2, t = sp.symbols("a b c d e f c_1 c_2 t", real=True, positive = True)
myExpr=((c1*sp.exp((a+b+c)*t)+c2*sp.exp((d+e+f)*t)))/(c1+c2)*sp.exp(-(a+b+c+d+e+f)*t)
And I would like an output as this one :
output= ((c1*sp.exp(-(e+d+f)*t)+c2*sp.exp(-(a+b+c)*t)))/(c1+c2)
Rq : with (a+b+c) in the same exp : exp ( -(a+b+c)*t ) and not as a product of exp : exp(-a*t)*exp(-b*t)*exp(-c*t), similarly for (d+e+f)
Thanks for answer.
Okay, I see the problem. When you have an Add in the numerator and the denominator expand distributes a "negative" power into the denominator:
In [45]: expr = (a*x + b*x)*(1/x)*(1/(c + d))
In [46]: expr
Out[46]:
a⋅x + b⋅x
─────────
x⋅(c + d)
In [47]: expand(expr)
Out[47]:
a⋅x b⋅x
───────── + ─────────
c⋅x + d⋅x c⋅x + d⋅x
It's less obvious with the exponential function because it shows in the numerator
In [48]: expr.subs(x, exp(t))
Out[48]:
⎛ t t⎞ -t
⎝a⋅ℯ + b⋅ℯ ⎠⋅ℯ
─────────────────
c + d
however expand still treats it as belonging to the denominator just as a power E**(-t) would.
In [51]: expand_mul(expr.subs(x, exp(t)))
Out[51]:
t t
a⋅ℯ b⋅ℯ
─────────── + ───────────
t t t t
c⋅ℯ + d⋅ℯ c⋅ℯ + d⋅ℯ
I can't think of an easier way to do it than this:
In [84]: factor_terms(sum(powsimp(factor_terms(cancel(a))) for a in Add.make_args(myExpr.expand())))
Out[84]:
-t⋅(d + e + f) -t⋅(a + b + c)
c₁⋅ℯ + c₂⋅ℯ
───────────────────────────────────────
c₁ + c₂
I have a list of polynomial expressions, (in my case obtained as the output of a Groebner basis computation), that I would like to view. I am using Jupyter, and I have started off with
import sympy as sy
sy.init_printing()
This causes an individual expression to be given nicely typeset. For a non-Groebner example:
sy.var('x')
fs = sy.factor_list(x**99-1)
fs2 = [x[0] for x in fs[1]]
fs2
The result is a nice list of LaTeX-typeset expressions. But how do I print these expressions one at a time; or rather; one per line? I've tried:
for f in fs2:
sy.pprint(f)
but this produces ascii pretty printing, not LaTeX. In general the expressions I have tend to be long, and I really do want to look at them individually. I can of course do
fs2[0]
fs2[1]
fs2[2]
and so on, but this is tiresome, and hardly useful for a long list. Any ideas or advice? Thanks!
Jupyter (through IPython) has a convenience function called display which works well with SymPy:
import sympy as sy
sy.init_printing()
sy.var('x')
fs = sy.factor_list(x**99-1)
fs2 = [x[0] for x in fs[1]]
for f in fs2:
display(f)
Output:
You can also get the latex code for each of these polynomials by using the latex function:
import sympy as sy
from sympy.printing.latex import latex
sy.init_printing()
sy.var('x')
fs = sy.factor_list(x**99-1)
fs2 = [x[0] for x in fs[1]]
for f in fs2:
print(latex(f))
Output:
x - 1
x^{2} + x + 1
x^{6} + x^{3} + 1
x^{10} + x^{9} + x^{8} + x^{7} + x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x + 1
x^{20} - x^{19} + x^{17} - x^{16} + x^{14} - x^{13} + x^{11} - x^{10} + x^{9} - x^{7} + x^{6} - x^{4} + x^{3} - x + 1
x^{60} - x^{57} + x^{51} - x^{48} + x^{42} - x^{39} + x^{33} - x^{30} + x^{27} - x^{21} + x^{18} - x^{12} + x^{9} - x^{3} + 1
I need to write a regular expression that can detect a string that contains only the characters x,y, and z, but where the characters are different from their neighbors.
Here is an example
xyzxzyz = Pass
xyxyxyx = Pass
xxyzxz = Fail (repeated x)
zzzxxzz = Fail (adjacent characters are repeated)
I thought that this would work ((x|y|z)?)*, but it does not seem to work. Any suggestions?
EDIT
Please note, I am looking for an answer that does not allow for look ahead or look behind operations. The only operations allowed are alternation, concatenation, grouping, and closure
Usually for this type of question, if the regex is not simple enough to be derived directly, you can start from drawing a DFA and derive a regex from there.
You should be able to derive the following DFA. q1, q2, q3, q4 are end states, with q1 also being the start state. q5 is the failed/trap state.
There are several methods to find Regular Expression for a DFA. I am going to use Brzozowski Algebraic Method as explained in section 5 of this paper:
For each state qi, the equation Ri is a union of terms: for a transition a from qi to qj, the term is aRj. Basically, you will look at all the outgoing edges from a state. If Ri is a final state, λ is also one of the terms.
Let me quote the identities from the definition section of the paper, since they will come in handy later (λ is the empty string and ∅ is the empty set):
(ab)c = a(bc) = abc
λx = xλ = x
∅x = x∅ = ∅
∅ + x = x
λ + x* = x*
(λ + x)* = x*
Since q5 is a trap state, the formula will end up an infinite recursion, so you can drop it in the equations. It will end up as empty set and disappear if you include it in the equation anyway (explained in the appendix).
You will come up with:
R1 = xR2 + yR3 + zR4 + λ
R2 = + yR3 + zR4 + λ
R3 = xR2 + + zR4 + λ
R4 = xR2 + yR3 + λ
Solve the equation above with substitution and Arden's theorem, which states:
Given an equation of the form X = AX + B where λ ∉ A, the equation has the solution X = A*B.
You will get to the answer.
I don't have time and confidence to derive the whole thing, but I will show the first few steps of derivation.
Remove R4 by substitution, note that zλ becomes z due to the identity:
R1 = xR2 + yR3 + (zxR2 + zyR3 + z) + λ
R2 = + yR3 + (zxR2 + zyR3 + z) + λ
R3 = xR2 + + (zxR2 + zyR3 + z) + λ
Regroup them:
R1 = (x + zx)R2 + (y + zy)R3 + z + λ
R2 = zxR2 + (y + zy)R3 + z + λ
R3 = (x + zx)R2 + zyR3 + z + λ
Apply Arden's theorem to R3:
R3 = (zy)*((x + zx)R2 + z + λ)
= (zy)*(x + zx)R2 + (zy)*z + (zy)*
You can substitute R3 back to R2 and R1 and remove R3. I leave the rest as exercise. Continue ahead and you should reach the answer.
Appendix
We will explain why trap states can be discarded from the equations, since they will just disappear anyway. Let us use the state q5 in the DFA as an example here.
R5 = (x + y + z)R5
Use identity ∅ + x = x:
R5 = (x + y + z)R5 + ∅
Apply Arden's theorem to R5:
R5 = (x + y + z)*∅
Use identity ∅x = x∅ = ∅:
R5 = ∅
The identity ∅x = x∅ = ∅ will also take effect when R5 is substituted into other equations, causing the term with R5 to disappear.
This should do what you want:
^(?!.*(.)\1)[xyz]*$
(Obviously, only on engines with lookahead)
The content itself is handled by the second part: [xyz]* (any number of x, y, or z characters). The anchors ^...$ are here to say that it has to be the entirety of the string. And the special condition (no adjacent pairs) is handled by a negative lookahead (?!.*(.)\1), which says that there must not be a character followed by the same character anywhere in the string.
I've had an idea while I was walking today and put it on regex and I have yet to find a pattern that it doesn't match correctly. So here is the regex :
^((y|z)|((yz)*y?|(zy)*z?))?(xy|xz|(xyz(yz|yx|yxz)*y?)|(xzy(zy|zx|zxy)*z?))*x?$
Here is a fiddle to go with it!
If you find a pattern mismatch tell me I'll try to modify it! I know it's a bit late but I was really bothered by the fact that I couldn't solve it.
I understand this is quite an old question and has an approved solution as well. But then I am posting 1 more possible and quick solution for the same case, where you want to check your regular expression that contains consecutive characters.
Use below regular expression:
String regex = "\\b\\w*(\\w)\\1\\1\\w*";
Listing possible cases that above expression returning the result.
Case 1: abcdddd or 123444
Result: Matched
Case 2: abcd or 1234
Result: Unmatched
Case 3: &*%$$$ (Special characters)
Result: Unmatched
Hope this will be helpful...
Thanks:)
Is there an algorithm or tool to convert regular grammar to regular expression?
Answer from dalibocai:
My goal is to convert regular grammer to DFA. Finally, I found an excellent tool : JFLAP.
A tutorial is available here: https://www2.cs.duke.edu/csed/jflap/tutorial/framebody.html
The algorithm is pretty straightforward if you can compute an automaton from your regular expression. Once you have your automaton. For instance for (aa*b|c), an automaton would be (arrows go to the right):
a
/ \
a \ / b
-> 0 ---> 1 ---> 2 ->
\___________/
c
Then just "enumerate" your transitions as rules. Below, consider that 0, 1, and 2 are nonterminal symbols, and of course a, b and c are the tokens.
0: a1 | c2
1: a1 | b2
2: epsilon
or, if you don't want empty right-hand sides.
0: a1 | c
1: a1 | b
And of course, the route in the other direction provides one means to convert a regular grammar into an automaton, hence a rational expression.
From a theoretical point of view, an algorithm to solve this problem works by creating a regular expression from each rule in the grammar, and solving the resulting system of equations for the initial symbol.
For example, for regular grammar ({S,A},{a,b,c},P,S):
P:
S -> aA | cS | a | c
A -> aA | a | bS
Take each non-termimal symbol and generate regular expression from right hand:
S = aA + cS + a + c
A = aA + bS + c
Solve equation system for initial symbol S:
A = a(aA + bS + c) + bS + c
A = a⁺bS + a⁺c + bS + c
S = aA + c(aA + cS + a + c)
S = aA + c⁺aA + c⁺a + c⁺
S = a(a⁺bS + a⁺c + bS + c) + c⁺a(a⁺bS + a⁺c + bS + c) + c⁺a + c⁺
S = a⁺bS + a⁺c + c⁺a⁺bS + c⁺a⁺c + c⁺a + c⁺
S = (c⁺ + ε)a⁺bS + a⁺c + c⁺(a⁺c + a + ε)
substitution: x = (c⁺ + ε)a⁺b
S = x(xS + a⁺c + c⁺(a⁺c + a + ε)) + a⁺c + c⁺(a⁺c + a + ε)
S = x⁺a⁺c + x⁺c⁺(a⁺c + a + ε) + a⁺c + c⁺(a⁺c + a + ε)
S = x*(a⁺c + c⁺(a⁺c + a + ε))
S = ((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε))
Because all modifications were equivalent, ((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε)) is a regular expression equivalent to all words which can be produced from the initial symbol. Thus the value of this expression must be equivalent to the language generated by the grammar whose initial symbol is S.
It ain't pretty, but i purposefully picked a grammar including cycles to portray the way the algorithm works. The hardest part is recognizing that S = xS | x is equivalent to S = x⁺, then just doing the substitutions.
I'll leave this as an answer to this old question, in case that anybody finds it useful:
I have recently released a library for exactly that purpose:
https://github.com/rindPHI/grammar2regex
You can precisely convert regular grammars, but also compute approximate regular expressions for more general general context-free grammars. The output format can be configured to be a custom ADT type or the regular expression format of the z3 SMT solver (z3.ReRef).
Internally, the tool converts grammars to finite automata. If you're interested in the automaton itself, you can call the method right_linear_grammar_to_nfa.