Regex | Containing "bbb" - regex

I'm trying to create a Regex with chars 'a' and 'b'.
The only rule is that the regex must contain the word 'bbb' somewhere.
These are possible: aabbbaaaaaababa, abbba, bbb, aabbbaa, abbabbba, ...
These are not possible: abba, a, abb, bba, abbaaaabbaaaabba, ...
I have no idea how can I can express that.
Any ideas? Thanks in Advance!

Based on the tag "automata", I am guessing you are after the formal regular expression for this formal language. In that case, a regular expression is (a+b)bbb(a+b). The anatomy of this regular expression is the following:
(a+b) gives either "a" or "b"
(a+b)* gives any string of "a"s and "b"s whatever
bbb gives the string bbb only
the whole regular expression describes any string that begins with anything, then has bbb, then ends with anything
To prove this regular expression is correct, note that:
This regular expression only generates strings that contain the substring bbb. This is due to the middle part.
This regular expression generates all strings that contain the substring bbb. Suppose there were some string containing the substring bbb that this regular expression didn't generate. The string either starts with bbb or it doesn't. If it does, then the string is generated by our regular expression by repeating the first (a+b) zero times and the second (a+b) n - 3 times, where n is the length of the string. Otherwise, if it doesn't start with bbb, consider the suffix of length n - 1 as a recursive case. Continue thusly until the subcase does begin with bbb (it eventually must). Because this suffix is describable by our regular expression, the original case must be too since we can just repeat the first (a+b) an additional number of times equal to the depth of recursion.

The patter is kind simple
/b{3}/g
if you need it to match 3 and only 3 'b's, you can use
/b{3}[^b]?/g

Good evening! you can use this expression it might work
(a+b)* (bbb)(a+b)*
using this results in generating triple (bbb) minimum string
and by taking closure of (a+b) you can generate any type of strings containing triple b in them

Related

Regex match pair ocurrences of a specific character

I've been trying to make a regex that satisfies this conditions:
The word consists of characters a,b
The number of b characters must be pair (consecutive or not)
So for example:
abb -> accepted
abab -> accepted
aaaa -> rejected
baaab -> accepted
So far i got this: ([a]*)(((b){2}){1,})
As you can see i know very little about the matter, this checks for pairs but it does still accept words with odd number of b's.
You could use this regex to check for some number of as with an even number of bs:
^(?:a*ba*ba*)+$
This looks for 1 or more occurrences of 2 bs surrounded by some number (which may be 0) as.
Demo on regex101
Note this will match bb (or bbbb, bbbbbb etc.). If you don't want to do that, the easiest way is to add a positive lookahead for an a:
^(?=b*a)(?:a*ba*ba*)+$
Demo on regex101
Checking an Array of Characters Against Two Conditionals
While you could do this using regular expressions, it would be simpler to solve it by applying some conditional checks against your two rules against an Array of characters created with String#chars. For example, using Ruby 3.1.2:
# Rule 1: string contains only the letters `a` and `b`
# Rule 2: the number of `b` characters in the word is even
#
# #return [Boolean] whether the word matches *both* rules
def word_matches_rules word
char_array = word.chars
char_array.uniq.sort == %w[a b] and char_array.count("b").even?
end
words = %w[abb abab aaaa baaab]
words.map { |word| [word, word_matches_rules(word)] }.to_h
#=> {"abb"=>true, "abab"=>true, "aaaa"=>false, "baaab"=>true}
Regular expressions are very useful, but string operations are generally faster and easier to conceptualize. This approach also allows you to add more rules or verify intermediate steps without adding a lot of complexity.
There are probably a number of ways this could be simplified further, such as using a Set or methods like Array#& or Array#-. However, my goal with this answer was to make the code (and the encoded rules you're trying to apply) easier to read, modify, and extend rather than to make the code as minimalist as possible.

ReGex - matching a logical implication

Given the alphabet {a, b, c}, how can I create a simple regular expression which matches exactly those words which meet the following criteria:
If the string "aa" occurs, then consequently "cc" must also occur (Note the logical implication).
The order of occurence doesn't matter ("cc" as well as "aa" can occur first).
Due to the former logical implication (if-then relationship), the string "cc" can occur even without "aa", but not vice versa.
I am looking for a way to implement this by using these syntax elements (., *, +, ?, |, ) as well as brackets.
Example what should be matched:
cc
abba
bccb
bccaa
ε (epsilon - empty string)
What shouldn't be matched:
aa
aacbcb
abaaba
baaa
caac
I have tried the following: a?b?(ba)*(ccaa)*(aacc)*c*b?
Not sure if I should answer this, since it's homework, but here goes...
First of, you identify the clauses P:"string 'aa' occurs" and Q:"string 'cc' occurs". Then you transform P -> Q into !P v Q. Then you translate this into the regular expression formed of the two parts:
^(((b|c)*(a(b|c)+)*a?)|(.*cc.*))$
The first part denies any 'aa' groups, and goes like this: allow any number of bs and cs at the beginning (including none), then if you find an a, force at least one different character after it. as followed by non as may happen any number of times. At the end we also have the a? to allow for the string to end in an a and we are sure it has no a before it because none of the 2 preceding groups end in a.
The second part is trivial.
Test it here: http://rubular.com/r/mAPHO8bulo
See it here: http://jex.im/regulex/...

How to create a regular expression to match non-consecutive characters?

How to create a regular expression for strings of a,b and c such that aa and bb will be rejected?
For example, abcabccababcccccab will be accepted and aaabc or aaabbcccc or abcccababaa will be rejected.
If this is not a purely academical question you can simply search for aa and bb and negate your logic, for example:
s='abcccabaa'
# continue if string does not match.
if re.search('(?:aa|bb)', s) is None:
...
or simply scan the string for the two patterns, avoiding expensive regular expressions:
if 'aa' not in s and 'bb' not in s:
...
For such an easy task RE is probably total overkill.
P.S.: The examples are in Python but the principle applies to other languages of course.
^(?!.*(?:aa|bb))[abc]+$
See it here on Regexr
This regex would do two things
verify that your string consist only of a,b and c
fail on aa and bb
^ matches the start of the string
(?!.*(?:aa|bb)) negative lookahead assertion, will fail if there is aa or bb in the string
[abc]+ character class, allows only a,b,c at least one (+)
$ matches the end of the string
Using the & operator (intersection) and ~ (complement):
(a|b|c)*&~(.*(aa|cc).*)
Rewriting this without the these operators is tricky. The usual approach is to break it into cases.
In this case it is not all that difficult.
Suppose that the letter c is taken out of the picture. The only sequences then which don't have aa and bb are:
e (empty string)
a
b
b?(ab)*a?
Next what we can do is insert some optional 'c' runs into all possible interior places:
e (empty string)
a
b
(bc*)?(ac*bc*)*a?
Next, we have to acknowledge that illegal sequences like aabb become accepted if non-optional 'c' runs are put in the middle, as in for example acacbcbc'. We allow a finalaandb. This pattern can take care of our loneaandb` cases as well as matching the empty string:
(ac+|bc+)*(a|b)?
Then combine them together:
((ac+|bc+)*(a|b)?|(bc*)?(ac*bc*)*a?|(ac+|bc+)(a|b)?)
We are almost there: we also need to recognize that this pattern can occur an arbitrary number of times, as long as there are dividing 'c'-s between the occurences, and with arbitrary leading or trailing runs of c-s around the whole thing
c*((ac+|bc+)*(a|b)?|(bc*)?(ac*bc*)*a?|(ac+|bc+)(a|b)?)(c+((ac+|bc+)*(a|b)?|(bc*)?(ac*bc*)*a?|(ac+|bc+)(a|b)?))*c*
Mr. Regex Philbin, I'm not coming up with any cases that this doesn't handle, so I'm leaving it as my final answer.

find a regular expression for strings containing the substring aba over the alphabet {a, b}? (formal language theory)

The questions asks to find a regular expression for strings containing the substring aba over the alphabet {a, b}.
Does this mean anything can precede/procede aba so that the regular expression would be:
(aUb)*(aba)*(aUb)*
or is the question simply looking for:
(aba)*
Note: U means union and * means 0 or more times.
Since * means 0 or more, ε is in the first language, while you do not want it (it doesn't contain aba). You are looking for (aUb)*aba(aUb)*.
A substring is defined as
noun
a string that is part of a longer string
Also note that the second expression is a subset of the first.
The former: any string that contains aba at least once.

What's a prefix regular expression?

I'm reading something that mentions prefix regular expressions, and sites as an example /^joey/
What's a prefix regular expression? Does that mean it starts with a caret?
in REGEX ^ at the start of a regex means, "Starts with"
/^joey/
Would therefore match any string that starts with "joey" such as "joeyjoey" or "joey and jane"
A prefixed regular expression (PRE) is defined recursively
Empty set ø end empty string ""- are PREs
For each symbol a in alphabet, "a" is a PRE
If p and q are PREs denoting the regular sets P and Q, respectively, r is a regular expression denoting the regular set R such that e belongs to R, and x belongs to S, then the following expressions are also PREs:
p + q (union )
xp (concatenation with symbol x on the left) .
pr (concatenation with an e-regular on the right)
p* (star) .
This definition was taken from "Fast Text Searching for Regular Expressions or
Automaton Searching on Tries" work by RICARDO A. BAEZA-YATES and GASTON H. GONNET
In other words PRE means Regular Expression that language L has only strings with some fixed prefix.
abc.* - is PRE
(A|B)cd - is not PRE
The caret means that you match the start of a string for example /^joey/ will match "joey is there" since the string starts with "joey" but not "Is joey around?" since joey is in the middle of the sentence.
It's not a standard term. Whoever wrote that obviously means a regex that matches only at the beginning of the target text, as the other responders have said. The caret is usually used for that purpose, but it can also mean the beginning of a logical line, if the match is being performed in multiline mode. Many regex flavors support an additional construct that matches the very beginning of the text regardless of the matching mode, \A being its usual form.
For more details, read this.