how to describe this regular expression in English - regex

I am trying to describe regular expression in English here,
and let's say we have for (b(bb)*)*
you would say: zero or more b's
or we can have (a(aa)*b(bb)*)*
you would say: odd number of a's that end in odd number of b's
now my question is about ((a+b)a)*
you would say: words of even length where every even letter is an 'a'
where did the even length come from ??? how did they get every even letter is an 'a' ? is it from the zero a's because zero is an even number ?

((a+b)a)*
"you would say: words of even length where every even letter is an 'a'"
This is not a correct description. More accurate would be "words that have at least one a, followed by exactly one b, followed by exactly one a, zero or more times"
(+ means "one or more", * means "zero or more".)
It's more about the back and forth of as and bs--there could be million as between the bs, but there's never two bs next to each other.
And note that the inner parenthesis are not needed. In other words, this is equivalent:
(a+ba)*
Free spaced:
(a+ //"a", one or more times
b //followed by exactly one "b"
a //followed by exactly one "a"
)* //zero or more times

The only way your description is accurate is if you interpret the expression a+b as a or b. Most regular expression tools write this using a vertical bar, as in a|b. The other commenters and answerers have interpreted the + as a postfix operator meaning "one or more".
Using that reading, the reason every string in this set must be of even length is because the repetition comes outside a string of length 2. It means "zero or more copies of either aa or ba". Clearly every word matching that description is of even length. 0 is even by definition, and every second letter has to be an a.
{ ¢, aa, ba, aaaa, aaba, baaa, baba, ... }

OP, your question is based on curriculum from lecture and text where + is intended as an OR operation. The text and lecture material makes very clear what the notation means...
(b(bb)*)* = 0-n b's followed by 0-n bb's. in other words, zero or more b's.
(a(aa)*b(bb)*)* follows the pattern 2n + 1 a's following with 2n + 1 b's, if
not an empty string.
((a+b)a)* is more ambiguous i.e. more things could be said about it than the
answer, but the answer cannot be said to be wrong either. It is all
words of even length composed of all a's, or a's and b's. My guess
is that this answer would have gotten partial credit, and full
credit for including the part about a's being the even letter.

Related

Regular expression for strings where one letter is odd, while only having exactly 1 other letter

I've seen a lot of posts about odd and even letters but nothing about odd or even letters while having exactly one other letter. How would I solve for this? all strings that contain an odd number of a's and exactly one b. the set is {a,b}
^((b(a(aa)*))|((a(aa)*)b)|(aa)*b(a(aa)*)|((a(aa)*)b(aa)*))$
Above is my regular expression. There might be a prettier way to do it, but this is the simplest to understand.
Break it down piece by piece and find out in which cases this expression should be true. There are 4 cases where the expression can be true.
(b(a(aa)*)) b then odd amount of a's
((a(aa)*)b) odd amount of a's then there is b
(aa)*b(a(aa)*) even amount of a's, b, odd amount of a's
((a(aa)*)b(aa)*) odd amount of a's, b, even amount of a's
It helps to realize that even + odd = another odd number.

How do I convert language set notation to regular expressions?

I have this following questing in regular expression and I just can't get my head around these kind of problems.
L1 = { 0n1m | n≥3 ∧ m is odd }
How would I write a regular expression for this sort of problem when the alphabet is {0,1}.
What's the answer?
The regular expression for your example is:
000+1(11)*1
So what does this do?
The first two characters, 00, are literal zeros. This is going to be important for the next point
The second two characters, 0+, mean "at least one zero, no upper bound". These first four characters satisfy the first condition, which is that we have at least three zeros.
The next character, 1, is a literal one. Since we need to have an odd number of ones, this is the smallest number we're allowed to have
The last-but-one characters, (11), represent a logical grouping of two literal ones, and the ending * says to match this grouping zero or more times. Since we always have at least one 1, we'll always match an odd number. So we're done.
How'd I get that?
The key is knowing regular expression syntax. I happen to have quite a bit of experience in it, but this website helped me to verify.
Once you know the basic building blocks of regex, you need to break down your problem into what you can represent.
For example, regex allows us to specify a lower AND upper bound for matching (the {x,y} syntax), but doesn't allow to specify just a lower bound ({x} will match exactly x times). So I knew I would have to use either + or * to specify the zeros, as those are the only specifiers that permit an infinite number of matches. I also knew that it didn't make sense to apply those modifiers to a group; the restriction that we must have at least 3 zeroes doesn't imply that we must have a multiple of three, for example, so (000)+ was out. I had to apply the modifier to only one character, which meant I had to match a few literals first. 000 guarantees matching exactly three 0s, and 0* (Final expression 0000*) does exactly what I want, and then I condensed that to the equivalent 000+.
For the second condition, I had to think about what an odd number is. By definition, an odd number can be expressed by 2*k + 1, where k is an integer. So I had to match one 1 (Hence the literal 1), and some number of the substring 11. That led me to the group, and then the *. On a slightly different problem, you could write 1(11)+ to match any odd number of ones, and at least 3.
1 A colleague of mine pointed out to me that the + operator isn't technically part of the formal definition of regular expressions. If this is an academic question rather than a programming one, you might find the 0000* version more helpful. In that case, the final string would be 0000*1(11)*

regular expression to an English description

I'm really struggling with regular expressions. I have to give English descriptions of the following regular expressions can anyone please please please help me..
i. a(aa)*
ii. a(b*ab*ab*)*
iii. b(b*ab*ab*)*
heres my attempts but everyone else in the class has seems to have shorter answers.
i. Find a "a" followed by either zero or more times "aa"s should be seen
ii. Find a "a" followed by either zero or more times of this pattern :
(zero or more times "b" followed by zero or more times "ab" followed by zero or more times "ab")
iii. Find a "b" followed by either zero or more times of this pattern :
(zero or more times "b" followed by zero or more times "ab" followed by zero or more times "ab")
If those strings are actual regexes, they (completely) match the following:
An odd number of as.
A string starting with a, followed by any combination of as and bs, with an overall odd number of as.
A string starting with b, followed by any combination of as and bs, with an overall even number of as. Edge case: If the string contains more than one b, it needs to contain at least two as.
"Any combination" includes zero instances of each character.
Some possible matches for 1.:
a
aaa
aaaaaa
aaaaaaaa etc.
Some possible matches for 2.:
a
aaa
ababa
aaab
abbbbbbbbaa
ababababababa
Some possible matches for 3.:
b
baa
baba
baaaaaba
bbbbbbbbbbaa
bababababbbbb
There's a free tool Ultrapico Express which can help. Just run a match on any of the regexes you mentioned, then it should be relatively easy to translate into regular English;
i - an odd number of a's, with at least one a.
ii - an odd number of a's, with at least one a, and 0 or more b's between each pair of a's.
Your attempted solutions seem correct, but I would expect your professor will complain that you're description is rephrasing the RE and is not an English description of the result.
I'll leave iii back to you to re-word (mainly because it's more difficult than the other two and I'm lazy this morning!)
Let me hint you a bit:
How would you describe the regular expression 'a'? How about 'aa'?. Ok, now, how would you describe the expression 'a*' and '(aa)*' ? For the latter there is a pattern which is interesting. Now, try to combine them. What is a(aa)* ? If you write down a couple of specimens for the regular language, there is a pattern you can spot.
Odd and even plays a role here.
The trick is to cut up the regular expression and understand each part. Then write down a couple of strings which are in the language the RE decides. Then look for a pattern. My guess is that this is what your TA/Prof wants you to do in order to understand the relationsship between an RE and the language it decides.
An odd number of as.
A string starting with a, followed by any combination of single as and multiple bs (zero or more), with an overall odd number of as.
A string starting with b, followed by any combination of single as and multiple bs (zero or more), with an overall even number of as.

Grammars - RegEx

I am trying to construct a regular expression that the total number of a's is divisible by 3 no matter how they are distributed. aabaabbaba. This is What i came up with:
b*ab*ab*
Now, someone told me i could do it this way
(b*ab*ab*)*
Why would i need to enclose it and why is the outside kleene star needed?
Wouldnt the outside kleene distribute among all the a's and b's inside the parenthesis? if thats the case then what would a double kleene mean?
For the number of 'a's to be divisible by three, you'll need three 'a's in your expression. So the correct expression is:
(b*ab*ab*ab*)*
This expression is saying 'a' three times, with possible 'b's in the middle. The last star says repeat (the whole parenthesized expression) as necessary.
The outer * repeats the entire sequence zero or more times.
In other words, zero or more substrings that match b*ab*ab*.

Does this regular expression generate a regular language?

I was told that the language generated by the regular expression:
(a*b*)*
is regular.
However, my thinking goes against this, as follows. Can anyone please provide an explanation whether I'm thinking right or wrong?
My Thoughts
(a*b*) refers to a single sequence of any amount of a, followed by any amount of b (can be empty). And this single sequence (which can't be changed) can be repeated 0 or more time. For example:
a* = a
b* = bbbb
-> (a*b*) = abbbb
-> (a*b*)* = abbbbabbbbabbbb, ...
On the other hand, since aba is not an exact repetition of the sequence ab, it is not included in the language.
aaabaaabaaab => is included in the language
aba => is not included in the language
Thus, the language consists of sequences that are an arbitrary-time repetition of a subsequence that is any amount of a followed by any amount of b. Therefore, the language is not regular since it requires a stack.
It's a zero or more times, followed by b zero or more times, repeated zero or more times.
""
"a"
"b"
"ab"
"ba"
"aab"
"bbabb"
"aba"
all pass.
* is not +.
aba is in that language; it's just an overly-complicated way to say "the set of all strings consisting of as and bs".
EDIT: The repeating group doesn't mean that the contents of the group must be repeated exactly; that would require a backreference. ((a*b*)?\1*)
Rather, it means that the group itself should be repeated, matching any string that it can match.
Technically /(a*b*)*/ will match everything and nothing.
Because all the operators are *'s it means zero or more. So since zero is an option, it will pretty much match anything.
It's wrong, you don't need a stack. Your DFA just thinks "can I add just another a (or not)?" or "can I add just another b (or not)?" in an endless loop until the word is consumed.
It is a regular expression, yes.
The * say something like "can repeat 0 or more times". The + is basically similar, different only that it need one repeatition on minimal (or be 1 or more times).
This regular expressions says, somethink like:
Repeat "below group" zero or more times;
Repeat a zero or more times;
Repeat b zero or more times;
Can works fine with all of your examples.
Edit/Note: the aba is validated too.
I hope to help :p
Basically, it'll match any string thats empty or made by a bunch of a and b. It reads:
(('a' zero or + times)('b' zero or + times) zero of plus times
That's why it matches aba:
(('a' one time)('b' one time)) one time ((a one time)(b zero time)) one time
You're wrong. :)
0 is also an amount, so aba is in this language. It wouldn't be if the regex was (a+b+)+, because + would mean '1 or more' where * means '0 or more'.