Minimal Length of a regular Expression - regex

Working on my homework for a class and I came to this question:
For each of the following regular expressions, give minimal length strings that are
not in the language defined by the expression.
(bb)*(aa)*b*
a*(bab)*∪b∪ab
I'm going to try to only get help on the first one and see if i can figure out the second. Heres what I Know: Kleene * indicates 0 or more possible elements. and union of a set is the set containing all elements of set a and set b without repeating an element. Working through the first problem starting by inserting lambda, i get:
1st run: bbaab
2nd: bbbbaabaabbaabbbbaab
3rd: bbbbbbaabaabbaabbbbaabaabbbbaabaabbaabbbbaabbbbbbaabaabbaabbbbaab
If I'm doing that correctly than strings of length 0 to 5 are not in the language. Am i doing this correctly?

The first regular expression is matching any word that starts with an even number of 'b's (zero included) followed by an even number of 'a's (zero is ok), then followed by some 'b's.
This means that the empty string is in the language, as well as the string "b".
However, the string "a" is not in the language.
Thus all the minimal length string that are not in the language is "a".
The second regex matches on "", "a" and "aa" (by a*(bab)*) and also on "b" and "ab".
However it doesn't match on "ba" and "bb".
Thus the minimal strings are of length 2: "bb" and "ba".

Related

Regex to match strings containing two of any character but not three

I want a Regex to match strings containing the same character twice (not necessarily consecutive) but not if that character appears three times or more.
For example, given these two inputs:
abcbde
abcbdb
The first, abcbde would match because it contains b twice. However, abcbdb contains b three times, so that would not match.
I have created this Regex, however it matches both:
(\w).*\1{1}
I've also tried to use the ? modifier, however that still matches abcbdb, which I don't want it to.
You need two checks: a first check to ensure no character exists 3 times in the input, and a second check to look for one that exists 2 times:
^(?!.*(\w).*\1.*\1).*?(\w).*\2
This is horribly inefficient compared to, say, using your programming language to construct an array of character frequencies, requiring only 1 pass through the entire input. But it works.

I need regex to only take numbers in string

I thought I had it with [0-9] but when I ran it that only took one number.
The string goes for example:
1 note
1,234 notes
68,000 notes
I want it so it takes the whole number and leaves out the notes part and the spaces and also the comma so just the full number.
The [0-9] would only take the first number of the string even when there wasnt a comma.
So how to only take the number please?
[0-9] means any one character between 0 and 9. What you are looking for is these characters repeated any number of times, but no other character should be there. The correct way to write this is [0-9]+.
M+, where M is some regex rule is equivalent to M M*, where * means 0 or more occurrences. So M+ can be inferred as at least one occurrence of portions specified by M.
EDIT: The question now also states that the entire number should be read, but the comma should be excluded from the output. AFAIK, this is impossible to be done using only regex, as the matched text can't be different from the stored text. A possible solution is to add , to the list of allowed characters and parse the result to remove them later on.

Using stack, make a program that determines if a pattern of characters is valid using the following rule: A^N B^N

In C++, the program must read the patterns to evaluate from a file called Asig5.ent. You must create a file called Asig5.sal to put the results.
I know how to work with stacks, but I don't understand the instructions at all.
I'm not asking for someone to give me a code.
I just need someone to explain to me the instructions to do it.
A^NB^N is likely intended as a regular-expression-looking thing. Basically, it's a string that starts with some number of As, followed immediately by exactly as many Bs.
For instance, the following strings match the pattern:
""
"AB"
"AABB"
"AAABBB"
and the following do not:
"A"
"B"
"AAB"
"cat"
"AABBC"
Exponentiation notation on strings usually means repeated concatenation, so A^2 is AA, A^3 is AAA, etc. Then the set of strings that match this pattern is {A^NB^N | N >= 0}.

Regular Expression Challenge: For every consecutive 6 characters, there must be two 1s (alphabet {"0", "1"})

So, I want to built a regular expression that I can pass in a string of 0s and 1s (e.g. "0010101000111100100011110001101100011") and then make sure that for every 6 consecutive characters, there needs to be at least two 1s in that block.
Also, strings less than length 6 should pass.
Examples of passing strings:
""
"00"
"11000011"
"01010100"
Examples of failing strings:
"110000000011"
"000001"
These examples are of very small strings, but I want to build one to take any length string.
Now, I'm looking for a nice way to express this in a regular expression, rather than having solution with a loop and such.
Just use this regex and check that it doesn't match:
/000000|000001|000010|000100|001000|010000|100000/
Here is a regex that should do the trick (matches valid strings):
^((?!0{6}|10{5}|010{4}|001000|000100|0{4}10|0{5}1)[01])+$
Example: http://www.rubular.com/r/VelZ1Iqml6
This uses a negative lookahead inside of a repetition so that the condition is checked at every location in the string.
If you are able to just check for strings that don't match, that is more straightforward, and you can use davidrac's solution or this slightly shortened version (which I use in the lookahead of my regex):
0{6}|10{5}|010{4}|001000|000100|0{4}10|0{5}1

what is regular expression not generated over {a,b}?

I am really stuck with these 2 questions for over 2 days now. I'm trying to figure out what the question means. My tutor is out of town too.
Question 1: Write a regular expression for the only strings that are not generated over {a,b} by the expression: (a+b)****a(a+b)****. Explain your reasoning.
And I tried the second question. Do you think is there any better answer than this one?
What is a regular expression of a set of strings that contain an odd number of as or exactly two bs (a((a|b)(a|b))****|bb) I know to represent any odd length of a's, the RE is a((a|b)(a|b))****
Here's a start for the first question. First consider the strings that this regular expression generates:
(a+b)*a(a+b)*
It must begin with a AND
Every b must have at least one an a immediately before it AND
There must either be an aab or else the string must end in a.
The inverse of this is:
It must not begin with a OR
There is at least one b not after an a OR
The string consists only of repetitions of ab.
For the second question you should check that you have understood the question correctly. Your interpretation seems to be:
What is the regular expression for the set of strings that contain either (an odd number of a's and any number of b's) or (exactly two b's and no a's).
But another interpretation is this:
What is the regular expression for the set of strings that contain either (an odd number of a's and any number of b's) or (exactly two b's and any number of a's).
To match two a's you would use something like aa right?
Now we know that the + is a quantifier for 1 or more and the * is a quantifier for 0 or more. So if we want to repeat that entire pattern, we can put it in a group and repeat the entire pattern like so: (aa)+.
That would match:
aa
aaaa
But not:
a (because aa requires at least 2 items)`
aaa (because aa will match the first two, but you'll have an extra a)
And if we want to make that odd an even, we can simply add one extra a outside of the group like so: a(aa)+. However, since we wanted an odd amount without a specific minimum we shouldn't use + since that will require atleast 3 a's.
So the entire answer would be: (bb|a(aa)*)
It sounds like the first question is asking you to write a regular expression for the set of strings that do not match the provided regular expression.
For instance, suppose the question was asking for a regular expression for the set of strings not matched by aa+ over {a}. Well, here are a few strings that do match:
'aa'
'aaaa'
'aaaaa'
What are some strings that do not match? Here are the only two:
''
'a'
A regex for the latter set is a?.
Regarding the second question, I would suggest drumming up some positive and negative test cases. Run some strings like this through your regex and see what happens:
'a' (should pass)
'aaa' (should pass)
'bb' (should pass)
'' (should fail)
'aa' (should fail)
'aba' (should fail)
Good luck!
The expression (a+b)*a(a+b)* just means: there has to be an a inside the string. The only strings that cant be generated by this expression are those: b*
This expression means that RE must contain Atleast 1 'A' in the expression.
this expression doest not accept
'b'
'b'*
or
Empty set