Does this regular expression generate a regular language? - regex

I was told that the language generated by the regular expression:
(a*b*)*
is regular.
However, my thinking goes against this, as follows. Can anyone please provide an explanation whether I'm thinking right or wrong?
My Thoughts
(a*b*) refers to a single sequence of any amount of a, followed by any amount of b (can be empty). And this single sequence (which can't be changed) can be repeated 0 or more time. For example:
a* = a
b* = bbbb
-> (a*b*) = abbbb
-> (a*b*)* = abbbbabbbbabbbb, ...
On the other hand, since aba is not an exact repetition of the sequence ab, it is not included in the language.
aaabaaabaaab => is included in the language
aba => is not included in the language
Thus, the language consists of sequences that are an arbitrary-time repetition of a subsequence that is any amount of a followed by any amount of b. Therefore, the language is not regular since it requires a stack.

It's a zero or more times, followed by b zero or more times, repeated zero or more times.
""
"a"
"b"
"ab"
"ba"
"aab"
"bbabb"
"aba"
all pass.

* is not +.
aba is in that language; it's just an overly-complicated way to say "the set of all strings consisting of as and bs".
EDIT: The repeating group doesn't mean that the contents of the group must be repeated exactly; that would require a backreference. ((a*b*)?\1*)
Rather, it means that the group itself should be repeated, matching any string that it can match.

Technically /(a*b*)*/ will match everything and nothing.
Because all the operators are *'s it means zero or more. So since zero is an option, it will pretty much match anything.

It's wrong, you don't need a stack. Your DFA just thinks "can I add just another a (or not)?" or "can I add just another b (or not)?" in an endless loop until the word is consumed.

It is a regular expression, yes.
The * say something like "can repeat 0 or more times". The + is basically similar, different only that it need one repeatition on minimal (or be 1 or more times).
This regular expressions says, somethink like:
Repeat "below group" zero or more times;
Repeat a zero or more times;
Repeat b zero or more times;
Can works fine with all of your examples.
Edit/Note: the aba is validated too.
I hope to help :p

Basically, it'll match any string thats empty or made by a bunch of a and b. It reads:
(('a' zero or + times)('b' zero or + times) zero of plus times
That's why it matches aba:
(('a' one time)('b' one time)) one time ((a one time)(b zero time)) one time

You're wrong. :)
0 is also an amount, so aba is in this language. It wouldn't be if the regex was (a+b+)+, because + would mean '1 or more' where * means '0 or more'.

Related

Regular expressions to match word with three b, correct form?

I find this very ambiguous and vague and I would love to understand
I have these strings
abbb
bbb
aaaabaaabaaabaaabaaabaaab
babba
bbbaaaa
aaaaabbaba
And they are all valid because contains multiple of b, then I use:
(a*ba*ba*ba*)* and this matches them all
(a*ba*ba*b)*a* this match them all as well
a*(ba*ba*ba*)* same as above
Are these really all the same? Or there are edge cases that I am not seeing?
all of your regexes match the empty string, which doesn't have 3 b's.
This one,
(a*ba*ba*ba*)*
does not match aa. But the following match aa, and they are also equivalent:
(a*ba*ba*b)*a*
a*(ba*ba*ba*)*
If you want to force at least 3 b's, you have to take the b's out of the Kleene star:
(a|b)*b(a|b)*b(a|b)*b(a|b)*
* is zero or more. So,
even if you match using a regex like the ones below
(d*ef*gg*hi*)*
(s*o*m*e*t*h*i*n*g*)
etc.
they will match
(a*ba*ba*ba*)*
( match a word which may have an a or not or many a's then a b and then 0 or more a's and then a b and then 0 or more a's and one b and then 0 or more a's ) zero or more of this kind of match.. Its okay if we dont find a match thats what you want to say.
Similarly for your second case:
(a*ba*ba*b)*a*
(0 or more a and then a b and then 0 or more a and then a b then 0 or more a and then a b) 0 or more of this and zero or more of a after that.
So your regex basically matches so many 0 presence conditions, thats why you are not able to find the clear difference. better use + instead of *. A + quatifier will make the match only of the character is present at least 1 or more times.
you can play around with regex on this site here : http://regex101.com/r/rM5zQ1
for basic learnings regexone will be really helpful for you.
Hope that helps !
You should use + after the group instead of *, or else an empty string would be accepted:
(a*ba*ba*ba*)+
Although this would only allow multiples of 3. If you want at least 3 and any number of extras, it would be:
a*ba*ba*b(a|b)*
This works for those requirements. But it isn't a good approach. In your example you are searching for "a" and "b", which are single character patterns, and it's already an unreasonably long expression for the simple rule "has 3 b's" in my opinion. But what if the patterns were more complex? You would need to repeat them at least 3 times, making it even more unwieldy.
And what if the rules change slightly? If you wanted to match a maximum instead of a minimum number of b's, it would become even more complex / repetitive, because your only choice would be to combine the patterns for each possible number (1, 2, 3):
(a*ba*|a*ba*ba*|a*ba*ba*ba*)
Or if you decide the word must be a certain length, it actually becomes impossible, short of listing every permutation (for a 7 letter word, ba{3}bab, a{2}babab, b{3}a{4} etc.).
So, I think a better way to solve this is to match the basic generic pattern, then examine the results of the match to check the counts. For example, just match a "word":
(a|b)+
Then on the matching text, match b:
b
and test the number of matches and/or length of text as needed. Each pattern is only repeated a maximum of twice, and your code can easily be adapted to different requirements.

Expressing regular expression in words

I am trying to express the following regular expression in words. Please not this is not so much a programming regex, as opposed to some CS work I am doing. The regular expression is:
(ab + b)* + (ba + b)*
The spaces are meaningless and the '+' functions as an 'or'. My answer right now is:
"This regular expression represents every string that does not contain the substring 'aa', and whose last letter is 'b' if the first letter is 'a'"
Is this correct? If so, that last condition I put makes me a bit weary. Is there a way to perhaps simplify the summation?
Thanks guys.
Hm, not sure I agree with #ChristianTernus's reduction.
Assuming these are implicitly anchored, the original, (ab|b)*|(ba|b)*, in English, is:
a string entirely composed of ab and b, or
a string entirely composed of ba and b.
So, for example, abb would match as the first kind but not the second, and bba would match the second kind but not the first.
Meanwhile, note how neither abb nor bba would match the reduction, (ab)*|(ba)*|(b)*, which actually means,
a string entirely composed of ab, or
a string entirely composed of ba, or
a string entirely composed of b.
Actually, the way you Englishified it, I think was already the best! Though, I'd style it like this:
This regular expression represents a string composed entirely of 'a's and 'b's, with no consecutive 'a's, and whose last character is 'b' if the first character is 'a'.
Nearly identical to what you already wrote.
As #ChristianTernus (and #slebetman) point out, the above fails to take into account that the original expression accepts a null string (or even a string without 'a's, which isn't clear from my Englishification), so in fact I believe OP's Englishification was indeed the strongest.
(ab + b)* + (ba + b)*
Translated into common (PCRE) regex, that's
(ab|b)*|(ba|b)*
In other words: a string composed of either zero or more instances of either 'ab' or 'b', or zero or more instances of either 'ba' or 'b'.
#acheong87's answer is also correct. I like this because it matches more closely the original structure of the regular expression -- it wouldn't be hard to turn this back into the regex from whence it came.

regular expression to an English description

I'm really struggling with regular expressions. I have to give English descriptions of the following regular expressions can anyone please please please help me..
i. a(aa)*
ii. a(b*ab*ab*)*
iii. b(b*ab*ab*)*
heres my attempts but everyone else in the class has seems to have shorter answers.
i. Find a "a" followed by either zero or more times "aa"s should be seen
ii. Find a "a" followed by either zero or more times of this pattern :
(zero or more times "b" followed by zero or more times "ab" followed by zero or more times "ab")
iii. Find a "b" followed by either zero or more times of this pattern :
(zero or more times "b" followed by zero or more times "ab" followed by zero or more times "ab")
If those strings are actual regexes, they (completely) match the following:
An odd number of as.
A string starting with a, followed by any combination of as and bs, with an overall odd number of as.
A string starting with b, followed by any combination of as and bs, with an overall even number of as. Edge case: If the string contains more than one b, it needs to contain at least two as.
"Any combination" includes zero instances of each character.
Some possible matches for 1.:
a
aaa
aaaaaa
aaaaaaaa etc.
Some possible matches for 2.:
a
aaa
ababa
aaab
abbbbbbbbaa
ababababababa
Some possible matches for 3.:
b
baa
baba
baaaaaba
bbbbbbbbbbaa
bababababbbbb
There's a free tool Ultrapico Express which can help. Just run a match on any of the regexes you mentioned, then it should be relatively easy to translate into regular English;
i - an odd number of a's, with at least one a.
ii - an odd number of a's, with at least one a, and 0 or more b's between each pair of a's.
Your attempted solutions seem correct, but I would expect your professor will complain that you're description is rephrasing the RE and is not an English description of the result.
I'll leave iii back to you to re-word (mainly because it's more difficult than the other two and I'm lazy this morning!)
Let me hint you a bit:
How would you describe the regular expression 'a'? How about 'aa'?. Ok, now, how would you describe the expression 'a*' and '(aa)*' ? For the latter there is a pattern which is interesting. Now, try to combine them. What is a(aa)* ? If you write down a couple of specimens for the regular language, there is a pattern you can spot.
Odd and even plays a role here.
The trick is to cut up the regular expression and understand each part. Then write down a couple of strings which are in the language the RE decides. Then look for a pattern. My guess is that this is what your TA/Prof wants you to do in order to understand the relationsship between an RE and the language it decides.
An odd number of as.
A string starting with a, followed by any combination of single as and multiple bs (zero or more), with an overall odd number of as.
A string starting with b, followed by any combination of single as and multiple bs (zero or more), with an overall even number of as.

Grammars - RegEx

I am trying to construct a regular expression that the total number of a's is divisible by 3 no matter how they are distributed. aabaabbaba. This is What i came up with:
b*ab*ab*
Now, someone told me i could do it this way
(b*ab*ab*)*
Why would i need to enclose it and why is the outside kleene star needed?
Wouldnt the outside kleene distribute among all the a's and b's inside the parenthesis? if thats the case then what would a double kleene mean?
For the number of 'a's to be divisible by three, you'll need three 'a's in your expression. So the correct expression is:
(b*ab*ab*ab*)*
This expression is saying 'a' three times, with possible 'b's in the middle. The last star says repeat (the whole parenthesized expression) as necessary.
The outer * repeats the entire sequence zero or more times.
In other words, zero or more substrings that match b*ab*ab*.

what is regular expression not generated over {a,b}?

I am really stuck with these 2 questions for over 2 days now. I'm trying to figure out what the question means. My tutor is out of town too.
Question 1: Write a regular expression for the only strings that are not generated over {a,b} by the expression: (a+b)****a(a+b)****. Explain your reasoning.
And I tried the second question. Do you think is there any better answer than this one?
What is a regular expression of a set of strings that contain an odd number of as or exactly two bs (a((a|b)(a|b))****|bb) I know to represent any odd length of a's, the RE is a((a|b)(a|b))****
Here's a start for the first question. First consider the strings that this regular expression generates:
(a+b)*a(a+b)*
It must begin with a AND
Every b must have at least one an a immediately before it AND
There must either be an aab or else the string must end in a.
The inverse of this is:
It must not begin with a OR
There is at least one b not after an a OR
The string consists only of repetitions of ab.
For the second question you should check that you have understood the question correctly. Your interpretation seems to be:
What is the regular expression for the set of strings that contain either (an odd number of a's and any number of b's) or (exactly two b's and no a's).
But another interpretation is this:
What is the regular expression for the set of strings that contain either (an odd number of a's and any number of b's) or (exactly two b's and any number of a's).
To match two a's you would use something like aa right?
Now we know that the + is a quantifier for 1 or more and the * is a quantifier for 0 or more. So if we want to repeat that entire pattern, we can put it in a group and repeat the entire pattern like so: (aa)+.
That would match:
aa
aaaa
But not:
a (because aa requires at least 2 items)`
aaa (because aa will match the first two, but you'll have an extra a)
And if we want to make that odd an even, we can simply add one extra a outside of the group like so: a(aa)+. However, since we wanted an odd amount without a specific minimum we shouldn't use + since that will require atleast 3 a's.
So the entire answer would be: (bb|a(aa)*)
It sounds like the first question is asking you to write a regular expression for the set of strings that do not match the provided regular expression.
For instance, suppose the question was asking for a regular expression for the set of strings not matched by aa+ over {a}. Well, here are a few strings that do match:
'aa'
'aaaa'
'aaaaa'
What are some strings that do not match? Here are the only two:
''
'a'
A regex for the latter set is a?.
Regarding the second question, I would suggest drumming up some positive and negative test cases. Run some strings like this through your regex and see what happens:
'a' (should pass)
'aaa' (should pass)
'bb' (should pass)
'' (should fail)
'aa' (should fail)
'aba' (should fail)
Good luck!
The expression (a+b)*a(a+b)* just means: there has to be an a inside the string. The only strings that cant be generated by this expression are those: b*
This expression means that RE must contain Atleast 1 'A' in the expression.
this expression doest not accept
'b'
'b'*
or
Empty set