Math: Giving regular expression for a language: - regex

I am going over and learning regular expressions and languages. I was working through some questions about giving a regular expression to represent a specified language. The question I was a little stuck on is this:
Come up with a regular expression that expresses the following
language. The alphabet of the langauge is {a,b}.
The language of all strings with two consecutive a's, but no three
consecutive a's. (ie, "aa", "aabaa", "babaa" are in the language,
while "abab", "aaaab" is not).
My answer for this so far is:
(b*(e+a+aa)bb*)* (aa) (bb*(e+a+aa)b*)*
where 'e' is the empty string and '+' functions essentially as an 'or'.
I guess what I am wondering is if my answer is correct (I believe it is), and if it can at all be simplified?
Thanks guys.

I believe that your regular expression is correct. It ensures that an aa exists in the string, and makes sure that aaa cannot exist. As for being simplest (simplest being subjective here), I would say the following is simpler:
(b + ab + aab)* aa (b + ba + baa)*
Note that you could actually derive the above from the regular expression that you have. Taking just the part before the aa in your regular expression, we have:
(b*(e+a+aa)bb*)*
= (b*bb* + b*abb* + b*aabb*)*
= (b + ab + aab)*
That last step is a little bit of a jump, but it takes noticing that all those b*'s are redundant due to the * on the whole expression, and a b existing inside the brackets.

I think this regex matches your language as well:
^((ab|b)*aa(ba|b)*)*$

Related

Automata - Regular Expression (Union Case)

Automata 1) Recognizes strings with at least 2 a
Regular Expression = b*ab*a(a+b)*
Automata 2) Recognizes strings with at least 2 b
Regular Expression = a*ba*b(a+b)*
The regular expression obtained from A3 = A1 U A2 is equivalent to R3 = R1 + R2? Or it's not?
R3 = b*ab*a(a+b)* + a*ba*b(a+b)*
There is neither "one" automaton nor "one" regular expression for any language; generally there many reasonable ones and many more (maybe infinitely many) unreasonable ones. In this sense, your question is not entirely well-posed: the regular expression corresponding to the union of two DFAs may or may not look like regular expressions for the original DFAs, +'ed together.
So, if you mean, can they look the same, the answer is likely yes. If you mean, must they look the same, answer is likely no. If you instead want to fix the algorithms for constructing the union machine and getting the regular expression, maybe we could show that a fixed method of doing it always gives the same answer.
In your specific case, applying the Cartesian Product Machine construction to get a DFA for the union of the original DFAs and then applying the construction from the proof of equivalence between DFAs and REs, we can see that the structure of the RE obtained by +'ing the original REs can't be achieved starting from a DFA; you'd have needed an NFA to get a + between the LHS and RHS, but DFAs can only + among individual symbols, not subexpressions. Of course, it might be possible the RE can be algebraically manipulated to derive the target RE, but that isn't exactly the same.
All of the above hold for the question of equality of REs. However, you asked about equivalence. Almost always, we say two REs are equivalent if they generate the same language. If this is what you meant, then yes, +ing the two REs will give an RE equivalent to the one obtained by constructing a union machine and deriving an RE from that. The REs will not look the same but will generate the same language, just as (ab + e)(abab)* and (ab)* generate the same language despite looking a bit different.
Regular expressions are not like finite state parsers and it's usually a mistake to try to incorporate them into complex parsing scenarios.
But also, they are marvelous tools for specific problems. After reading your descriptive requirements, there is a simple regular expression that accomplishes it, but in a way you might not expect. Your requirements:
strings with at least 2 a
strings with at least 2 b
The Union of the two, or strings withat least two a's or two b's
([ab]).*?\1
This expression opens a capture group to capture either a or b. Then it allows zero or more 'any characters' followed by whatever was captured in the capture group (\1).

Regular expression for a DFA that accepts the empty string as well as other strings

For the following DFA NFA
I produced the RE
(a + b)(ab)*
However, I then realised that my RE doesn't accept the empty string as it only accepts strings beginning with an a or a b, yet the DFA NFA also accepts the empty string as the initial state is an accepting state.
What is a valid RE for this DFA NFA? I would think something along the lines of
Ø + (a + b)(ab).*
but I doubt this syntax is accepted.
EDIT
I have also just realised that the example I have made is an NFA, but that is besides the point of the question.
(a + b)* would be the answer of what you mentioned in the question title. With an *, it means you can have empty string. a+b means you can choose either a or b. So finally you can choose a or b repeatedly.
How about the following expression?
^(|[ab](ab)*)$
This accepts either an empty string, or a sequence of none or more times ab, preceded by an a or b.
But it depends on the exact regular expression dialect if this solves your problem.

Show that two regular expressions are equivalent in Automata Theory without using DFAs

I have been trying to prove that two regex are equivalent. I know that two regex are equivalent if they define the same language. But i am not getting my hands of way to prove it without using DFAs.
For example, i have the problem to prove that the following are equivalent.
(a + b)*a(a + b)*b(a + b)* = (a + b)*ab(a + b)*
I know both of these define the language having atleast one 'a' and one 'b'.
The same is the case with the following.
(a + b)*ab(a +b)* + b*a* = (a + b)*
Any help will be appreciated.
Thanks
You should be able to prove them using the identities on slide 16 of
this regex lecture. In particular, I'd recommend clever use of the last equality of the 9th identity there, R* = RR*+e.
By the way, the first language is not precisely "at least one 'a' and one 'b'". For example, 'ba' is not in the language, but has at least one 'a' and one 'b'.
I think in first language there is (a+b)* in the middle which mean that this is arbitrary so we can ignored the arbitrary (a+b)* so it will become equivalent

REGULAR LANGUAGES AND REGULAR EXPRESSIONS (theory of automata)

I am going through book of "introduction to language and the theory of computation by John C martin" chapter # 3 section 3.1. Following exercise, question # 3.7 (i)"The language of all strings containing both bb and aba as sub-strings." this question puzzled me".
here is the expression i made. i do not know its good or wrong:
"(a+b)*((bb(a+b)*aba)+(bb(a+b)*aba))(a+b)*".
I am also confuse with "+" and "|" symbols. I think its same. is not it? (yes?/no?)???
+ and | are actually very different. a+ is the same as writing a(a*). It is telling you write the string one or more times. | is an operator that gives you a choice. (a|b) is telling you to choose a or b.
Your expression that you chose seems correct except that all + should be converted to |.

Expressing regular expression in words

I am trying to express the following regular expression in words. Please not this is not so much a programming regex, as opposed to some CS work I am doing. The regular expression is:
(ab + b)* + (ba + b)*
The spaces are meaningless and the '+' functions as an 'or'. My answer right now is:
"This regular expression represents every string that does not contain the substring 'aa', and whose last letter is 'b' if the first letter is 'a'"
Is this correct? If so, that last condition I put makes me a bit weary. Is there a way to perhaps simplify the summation?
Thanks guys.
Hm, not sure I agree with #ChristianTernus's reduction.
Assuming these are implicitly anchored, the original, (ab|b)*|(ba|b)*, in English, is:
a string entirely composed of ab and b, or
a string entirely composed of ba and b.
So, for example, abb would match as the first kind but not the second, and bba would match the second kind but not the first.
Meanwhile, note how neither abb nor bba would match the reduction, (ab)*|(ba)*|(b)*, which actually means,
a string entirely composed of ab, or
a string entirely composed of ba, or
a string entirely composed of b.
Actually, the way you Englishified it, I think was already the best! Though, I'd style it like this:
This regular expression represents a string composed entirely of 'a's and 'b's, with no consecutive 'a's, and whose last character is 'b' if the first character is 'a'.
Nearly identical to what you already wrote.
As #ChristianTernus (and #slebetman) point out, the above fails to take into account that the original expression accepts a null string (or even a string without 'a's, which isn't clear from my Englishification), so in fact I believe OP's Englishification was indeed the strongest.
(ab + b)* + (ba + b)*
Translated into common (PCRE) regex, that's
(ab|b)*|(ba|b)*
In other words: a string composed of either zero or more instances of either 'ab' or 'b', or zero or more instances of either 'ba' or 'b'.
#acheong87's answer is also correct. I like this because it matches more closely the original structure of the regular expression -- it wouldn't be hard to turn this back into the regex from whence it came.