Regular expression: sequences - regex

Stuck with a silly problem. I have an input field, where user can input limited amount of letters (ABCDEFG). Here is the problem: I do not want users to be able to have more than 3 A, C, E and G's letters in a single subsequence of the input, that is: no AAA, CCC, EEE, GGG. And the second thing is almost the same as the first one: no more than 1 B, D, F in a single subsequence, that is: no BB, DD, FF. These two rules are somehow to be combined together.
So, for example, AABFGECC is valid. GEFFFAABG is invalid.
Hope, you will help me! Thank you!
P.S If it is important, I am writing my app in Visual Basic. But I think, this is not so important.

What if you made an expression which matches the cases you want to avoid and instead check that the input does not match? Like this:
([^A-F]|AAA|CCC|EEE|GGG|BB|DD|FF)

While you could be clever and use back-references, a simple solution is to black-list the invalid sequences using a negative look ahead:
^(?!.*(?:AAA|CCC|EEE|GGG|BB|DD|FF))[A-G]*$
Logically, that is the same has having those 7 invalid sequences in a list, and checking the string does not contain either one, which also gives you a nice alternative.

Related

Regex Pattern to Match Certain Rules

UPDATE: Turns out I am an idiot and misread the original problem. The original problems specifies that there must be AT MOST 3 of the 4 letters used, not AT LEAST. This completely changes the question and eliminates my doubts when creating the NFA and DFA. Sorry everyone and thanks for the help!
For a homework problem, I have to create a regex pattern that will match these specifications...
Must be composed of only letters a, b, c, and d
Must be in reverse alphabetical order
Must use at least 3 of the 4 given letters (a,b,c,d that is, can have as many total letters as possible)
My answer that I am fairly confident about is (d+c+b+a*)|(d+b+a+)|(d+c+a+)|(c+b+a+). My first question is if this is correct. My second is if this expression can be simplified or altered at all.
The next step is to draw a graph for a non-deterministic finite automaton for the regex and I am having difficulty completing that step.
As requested, here is my attempt at an NFA (rough sketch)
Your answer was fine to me.
There are 5 kinds of patterns to match:
d+c+b+a+
c+b+a+
d+b+a+
d+c+a+
d+c+b+
(d+c+b+a+)|(c+b+a+)|(d+b+a+)|(d+c+a+)|(d+c+b+),which could be in the following forms by merging the first pattern to the sub-fours:
(d+c+b+a*)|(d+b+a+)|(d+c+a+)|(c+b+a+) OR
(d*c+b+a+)|(d+b+a+)|(d+c+a+)|(d+c+b+) ...
A quite straight forward way to me:
(d*c+b+a+)|(d+c*b+a+)|(d+c+b*a+)|(d+c+b+a*)
You can simplify (a little) by noting that only input starting with d+c+ have flexibility of 3 or 4 letters:
^(d+c+(b+a*|b*a+)|d+b+a+|c+b+a+)$
See live demo.

Is the language L={words without the substring 'bb' } regular?

L = {words such that the substring 'bb' is not present in in it
Given that the alphabet is A = {a,b}, is this language regular? If so, is there a regular expression that represents it?
Yes, this language is regular. Since this looks like homework, here's a hint: if the string bb isn't present, then the string consists of lots of blocks of strings of the form a* or a*b. Try seeing how to assemble the solution from this starting point.
EDIT: If this isn't a homework problem, here's one possible solution:
(a*(ba+)*b?)?
The idea is to decompose the string into a lot of long sequences of as with some b's interspersed in-between them. The first block of a's is at the front. Then, we repeatedly place down a b, at least one a, and then any number of additional as. Finally, we may optionally have one b at the end. As an alternative, we could have the empty string, so the entire thing is guarded by a ?.
Hope this helps!

How do I find words with all the specified characters, with repetition?

Is there a way to find the words containing all the given characters, include the repetitive ones, with regular expression? For example, I want to find all words from list
aabc, abbc, bbbc, aaac, aaab, baac, caab, abca
that contain exactly one 'b' and two 'a's, i.e. aabc, baac, caab, and abca (but NOT aaab as it has an additional 'a'). Word length doesn't matter.
While this question
GREP How do I only retrieve words with only the specified letters?
could give me some hint, I wasn't able to extend it so it will find repeative characters.
I am just playing with re module from Python, but there is no restrcition on language / tool for the question.
EDIT:
A better example / usecase would be: Given a list of words, show only those that contain all the letters entered by a user, e.g. I would like to find all words containing exactly one 'a', two 'd's and one 's'. Is this something regex capable of? (I already know how to do it without regex.)
To match exactly 2 a's and 1 b (in any order) in your input string use this regex:
(?=^(?:[^a]*a){2}[^a]*$)(?=^[^b]*b[^b]*$)^.+$
Here is a live demo for you.
If your regex flavor supports lookaheads, then you can use this:
\b(?=.*b)(?=([^a]*a){2}[^a]*\b)[abc]+\b
This requires at least one b and exactly 2 a's, and allows only a, b and c in the string. If you want to require exactly one b and exactly 4 characters in total, use this:
\b(?=[^b]*b[^b]*\b)(?=([^a]*a){2}[^a]*\b)[abc]{4}\b

Regular Expression Help - Double checking answers

I'm still a noob at Regular Expressions, but would anyone be so kind to double check to see if my answer is correct?
Question is: Indicate whether each of the given input strings belong to the language defined by the regex (a | empty) b (a | b)* a (b)*
Empty = flipped around 3 (empty string)
(a) input string: ababaa
Answer: Does not belong to the regex
because if tested, turns out to be ababab
(b) input string: aabbaa
Answer: Does not belong to the regex
Because if tested, turns out to be ab(b or a)* ab
are these answers correct?
The second string does not belong to the language. If you look at the regex, you can see that b must either be the first character (if (a|empty) selects empty), or must be the second character (if (a|empty) selects a). Since the string starts with aa, it can't match.
The first string does match. Just try to figure out each choice point so that you get the string provided. It might help to work from the outside in, since (a|b)* is the most flexible part of the regex - i.e. you can match whatever you want to it.
You can double check your answers by simply running them. Here's a site that does exactly that: http://regex.larsolavtorvik.com/

regular expression to an English description

I'm really struggling with regular expressions. I have to give English descriptions of the following regular expressions can anyone please please please help me..
i. a(aa)*
ii. a(b*ab*ab*)*
iii. b(b*ab*ab*)*
heres my attempts but everyone else in the class has seems to have shorter answers.
i. Find a "a" followed by either zero or more times "aa"s should be seen
ii. Find a "a" followed by either zero or more times of this pattern :
(zero or more times "b" followed by zero or more times "ab" followed by zero or more times "ab")
iii. Find a "b" followed by either zero or more times of this pattern :
(zero or more times "b" followed by zero or more times "ab" followed by zero or more times "ab")
If those strings are actual regexes, they (completely) match the following:
An odd number of as.
A string starting with a, followed by any combination of as and bs, with an overall odd number of as.
A string starting with b, followed by any combination of as and bs, with an overall even number of as. Edge case: If the string contains more than one b, it needs to contain at least two as.
"Any combination" includes zero instances of each character.
Some possible matches for 1.:
a
aaa
aaaaaa
aaaaaaaa etc.
Some possible matches for 2.:
a
aaa
ababa
aaab
abbbbbbbbaa
ababababababa
Some possible matches for 3.:
b
baa
baba
baaaaaba
bbbbbbbbbbaa
bababababbbbb
There's a free tool Ultrapico Express which can help. Just run a match on any of the regexes you mentioned, then it should be relatively easy to translate into regular English;
i - an odd number of a's, with at least one a.
ii - an odd number of a's, with at least one a, and 0 or more b's between each pair of a's.
Your attempted solutions seem correct, but I would expect your professor will complain that you're description is rephrasing the RE and is not an English description of the result.
I'll leave iii back to you to re-word (mainly because it's more difficult than the other two and I'm lazy this morning!)
Let me hint you a bit:
How would you describe the regular expression 'a'? How about 'aa'?. Ok, now, how would you describe the expression 'a*' and '(aa)*' ? For the latter there is a pattern which is interesting. Now, try to combine them. What is a(aa)* ? If you write down a couple of specimens for the regular language, there is a pattern you can spot.
Odd and even plays a role here.
The trick is to cut up the regular expression and understand each part. Then write down a couple of strings which are in the language the RE decides. Then look for a pattern. My guess is that this is what your TA/Prof wants you to do in order to understand the relationsship between an RE and the language it decides.
An odd number of as.
A string starting with a, followed by any combination of single as and multiple bs (zero or more), with an overall odd number of as.
A string starting with b, followed by any combination of single as and multiple bs (zero or more), with an overall even number of as.