Regular Expression Help - Double checking answers - regex

I'm still a noob at Regular Expressions, but would anyone be so kind to double check to see if my answer is correct?
Question is: Indicate whether each of the given input strings belong to the language defined by the regex (a | empty) b (a | b)* a (b)*
Empty = flipped around 3 (empty string)
(a) input string: ababaa
Answer: Does not belong to the regex
because if tested, turns out to be ababab
(b) input string: aabbaa
Answer: Does not belong to the regex
Because if tested, turns out to be ab(b or a)* ab
are these answers correct?

The second string does not belong to the language. If you look at the regex, you can see that b must either be the first character (if (a|empty) selects empty), or must be the second character (if (a|empty) selects a). Since the string starts with aa, it can't match.
The first string does match. Just try to figure out each choice point so that you get the string provided. It might help to work from the outside in, since (a|b)* is the most flexible part of the regex - i.e. you can match whatever you want to it.

You can double check your answers by simply running them. Here's a site that does exactly that: http://regex.larsolavtorvik.com/

Related

find a regular expression where a is never immediately followed by b (Theory of formal languages)

I need to find a simplified regular expression for the language of all strings
of a's, b's, and c's where a is never immediately followed by b.
I tried something and reached till (a+c)*c(b+c)* + (b+c)*(a+c)*
Is this fine and if so can this be simplified?
Thanks in advance.
You are looking for a negative lookbehind:
(?<!a)b
This will find you all the b instances that are not immediately following a
Or a negative lookahead:
a(?!b)
This will find you all the a instances that are not immediately followed by b
Here is a regex101 example for the lookbehind:
https://regex101.com/r/RsqXbW/1
Here is a regex101 example for the lookahead:
https://regex101.com/r/qiDIZU/1
You solution contains only strings from the desired language. However, it does not contain all of them. For example acbac is not contained. Your basic idea is fine, but you need to be able to iterate the possible factors. In:
(b+c)*(a (a)*(c(b+c)*)*)*
the first part generates all strings withhout a.
After the first a there come either nothing, another a or c. Another a leaves us with the same three options. c basically starts the game again. This is what the part after the first a formalizes. The many * are needed to possibly generate the empty string in all of the different options.

Find strings that do not begin with something

This has been gone over but I've not found anything that works consistently... or assist me in learning where I've gone awry.
I have file names that start with 3 or more digits and ^\d{3}(.*) works just fine.
I also have strings that start with the word 'account' and ^ACCOUNT(.*) works just fine for these.
The problem I have is all the other strings that DO NOT meet the two previous criteria. I have been using ^[^\d{3}][^ACCOUNT](.*) but occasionally it fails to catch one.
Any insights would be greatly appreciated.
^[^\d{3}][^ACCOUNT](.*)
That's definitely not what you want. Square brackets create character classes: they match one character from the list of characters in brackets. If you put a ^ then the match is inverted and it matches one character that's not listed. The meaning of ^ inside brackets is completely different from its meaning outside.
In short, [] is not at all what you want. What you can do, if your regex implementation supports it, is use a negative lookahead assertion.
^(?!\d{3}|ACCOUNT)(.*)
This negative lookahead assertion doesn't match anything itself. It merely checks that the next part of the string (.*) does not match either \d{3} or ACCOUNT.
Demorgan's law says: !(A v B) = !A ^ !B.
But unfortunately Regex itself does
not support the negation of expressions. (You always could rewrite it, but sometimes, this is a huge task).
Instead, you should look at your Programming Language, where you can negate values without problems:
let the "matching" function be "match" and you are using match("^(?:\d{3}|ACCOUNT)(.)") to determine, whether the string matches one of both conditions. Then you could simple negate the boolean return value of that matching function and you'll receive every string that does NOT match.

I need help finding all strings not included in (a*b)*

I'm doing this for homework. I need to write a regular expression for a language over (a, b) that includes all strings not included in a language (a*b)*
for example 'aaaaaaabaaaaaaaabaaaaaaabaaaabaaaaaaaaab' would work. So I'm looking for a regular expression for allll the strings which are not included in that.
Can you help me at least get on the right step to figure it out?
I know that a*b means as many a's as we want followed by one b. Then that whole sort as many times as we want.
It seems that your (a*b)* regular expression matches anything that ends in a b or is empty.
so a regular expression that matches anything that end in a seems to be the solution..
a$ or /a$/
[^ab]*
This probably should do the trick.
This says the string should not have any a or b. ^ is the symbol for negation.
You could iterate over the String and check for characters besides 'a' and 'b'. It seems chains of 'b's are allowed and the empty string are a part of the language, since the Kleene star allows for zero instances of the character.

Expressing regular expression in words

I am trying to express the following regular expression in words. Please not this is not so much a programming regex, as opposed to some CS work I am doing. The regular expression is:
(ab + b)* + (ba + b)*
The spaces are meaningless and the '+' functions as an 'or'. My answer right now is:
"This regular expression represents every string that does not contain the substring 'aa', and whose last letter is 'b' if the first letter is 'a'"
Is this correct? If so, that last condition I put makes me a bit weary. Is there a way to perhaps simplify the summation?
Thanks guys.
Hm, not sure I agree with #ChristianTernus's reduction.
Assuming these are implicitly anchored, the original, (ab|b)*|(ba|b)*, in English, is:
a string entirely composed of ab and b, or
a string entirely composed of ba and b.
So, for example, abb would match as the first kind but not the second, and bba would match the second kind but not the first.
Meanwhile, note how neither abb nor bba would match the reduction, (ab)*|(ba)*|(b)*, which actually means,
a string entirely composed of ab, or
a string entirely composed of ba, or
a string entirely composed of b.
Actually, the way you Englishified it, I think was already the best! Though, I'd style it like this:
This regular expression represents a string composed entirely of 'a's and 'b's, with no consecutive 'a's, and whose last character is 'b' if the first character is 'a'.
Nearly identical to what you already wrote.
As #ChristianTernus (and #slebetman) point out, the above fails to take into account that the original expression accepts a null string (or even a string without 'a's, which isn't clear from my Englishification), so in fact I believe OP's Englishification was indeed the strongest.
(ab + b)* + (ba + b)*
Translated into common (PCRE) regex, that's
(ab|b)*|(ba|b)*
In other words: a string composed of either zero or more instances of either 'ab' or 'b', or zero or more instances of either 'ba' or 'b'.
#acheong87's answer is also correct. I like this because it matches more closely the original structure of the regular expression -- it wouldn't be hard to turn this back into the regex from whence it came.

what is regular expression not generated over {a,b}?

I am really stuck with these 2 questions for over 2 days now. I'm trying to figure out what the question means. My tutor is out of town too.
Question 1: Write a regular expression for the only strings that are not generated over {a,b} by the expression: (a+b)****a(a+b)****. Explain your reasoning.
And I tried the second question. Do you think is there any better answer than this one?
What is a regular expression of a set of strings that contain an odd number of as or exactly two bs (a((a|b)(a|b))****|bb) I know to represent any odd length of a's, the RE is a((a|b)(a|b))****
Here's a start for the first question. First consider the strings that this regular expression generates:
(a+b)*a(a+b)*
It must begin with a AND
Every b must have at least one an a immediately before it AND
There must either be an aab or else the string must end in a.
The inverse of this is:
It must not begin with a OR
There is at least one b not after an a OR
The string consists only of repetitions of ab.
For the second question you should check that you have understood the question correctly. Your interpretation seems to be:
What is the regular expression for the set of strings that contain either (an odd number of a's and any number of b's) or (exactly two b's and no a's).
But another interpretation is this:
What is the regular expression for the set of strings that contain either (an odd number of a's and any number of b's) or (exactly two b's and any number of a's).
To match two a's you would use something like aa right?
Now we know that the + is a quantifier for 1 or more and the * is a quantifier for 0 or more. So if we want to repeat that entire pattern, we can put it in a group and repeat the entire pattern like so: (aa)+.
That would match:
aa
aaaa
But not:
a (because aa requires at least 2 items)`
aaa (because aa will match the first two, but you'll have an extra a)
And if we want to make that odd an even, we can simply add one extra a outside of the group like so: a(aa)+. However, since we wanted an odd amount without a specific minimum we shouldn't use + since that will require atleast 3 a's.
So the entire answer would be: (bb|a(aa)*)
It sounds like the first question is asking you to write a regular expression for the set of strings that do not match the provided regular expression.
For instance, suppose the question was asking for a regular expression for the set of strings not matched by aa+ over {a}. Well, here are a few strings that do match:
'aa'
'aaaa'
'aaaaa'
What are some strings that do not match? Here are the only two:
''
'a'
A regex for the latter set is a?.
Regarding the second question, I would suggest drumming up some positive and negative test cases. Run some strings like this through your regex and see what happens:
'a' (should pass)
'aaa' (should pass)
'bb' (should pass)
'' (should fail)
'aa' (should fail)
'aba' (should fail)
Good luck!
The expression (a+b)*a(a+b)* just means: there has to be an a inside the string. The only strings that cant be generated by this expression are those: b*
This expression means that RE must contain Atleast 1 'A' in the expression.
this expression doest not accept
'b'
'b'*
or
Empty set