Regular expression for formal languages - regex

I'm trying to write a regular expression for a language consisting of:
Strings which contain any number of a’s followed by a single b and
Strings which contain any number of a’s followed by a single b followed by an even number of a's.
I thought (b | ((a^+)b)^* ) U (a | ( (b^+) a)* ) but it was wrong.
Is there anyone who knows where am I wrong?

Assumption
I'll assume it should be "strings that consist of", not "strings which contains". The difference is that bbbbbaaabaabbbb would be a valid string if it's "contains" (since it contains aaabaa).
To make it "strings that contains", the only difference would be adding .*? to the start and .* to the end (or [ab]*? and [ab]* if you want to limit it to a and b).
Problem analysis
I believe you can simplify the problem to just "strings that consist of any number of a's followed by a single b followed by an even number of a's", since 0 is an even number.
I have no idea what ^ or U is doing in your regular expression. Is this language specific syntax (usually ^ indicates the start of the line / string)?
Solution
It should be as simple as:
a*b(aa)*
a* - any number of a's
b - a single b
(aa)* an even number of a's
EDIT:
According to comments, it appears that you may want strings that consist of something like:
any number of a's
followed by any number of the following:
a single b
followed by an even number of a's (number != 0)
optionally followed by a b
The regex would be:
a*(b(aa)+)*b?

Related

Regex | Containing "bbb"

I'm trying to create a Regex with chars 'a' and 'b'.
The only rule is that the regex must contain the word 'bbb' somewhere.
These are possible: aabbbaaaaaababa, abbba, bbb, aabbbaa, abbabbba, ...
These are not possible: abba, a, abb, bba, abbaaaabbaaaabba, ...
I have no idea how can I can express that.
Any ideas? Thanks in Advance!
Based on the tag "automata", I am guessing you are after the formal regular expression for this formal language. In that case, a regular expression is (a+b)bbb(a+b). The anatomy of this regular expression is the following:
(a+b) gives either "a" or "b"
(a+b)* gives any string of "a"s and "b"s whatever
bbb gives the string bbb only
the whole regular expression describes any string that begins with anything, then has bbb, then ends with anything
To prove this regular expression is correct, note that:
This regular expression only generates strings that contain the substring bbb. This is due to the middle part.
This regular expression generates all strings that contain the substring bbb. Suppose there were some string containing the substring bbb that this regular expression didn't generate. The string either starts with bbb or it doesn't. If it does, then the string is generated by our regular expression by repeating the first (a+b) zero times and the second (a+b) n - 3 times, where n is the length of the string. Otherwise, if it doesn't start with bbb, consider the suffix of length n - 1 as a recursive case. Continue thusly until the subcase does begin with bbb (it eventually must). Because this suffix is describable by our regular expression, the original case must be too since we can just repeat the first (a+b) an additional number of times equal to the depth of recursion.
The patter is kind simple
/b{3}/g
if you need it to match 3 and only 3 'b's, you can use
/b{3}[^b]?/g
Good evening! you can use this expression it might work
(a+b)* (bbb)(a+b)*
using this results in generating triple (bbb) minimum string
and by taking closure of (a+b) you can generate any type of strings containing triple b in them

Specifying range style in which all the elements must be present

I am practising my regex here
I have the following string
nabcdf
and I would like to select all of it. so I wrote the following regex
(n[abc]) -> n followed by a , b or c
because of this only n and a are highlighted. Based on this I have two questions
1)Why arent b and c also highlighted ? since they are present as well ?
2)[abc] specify that either a or b or c is present. Is it possible to specify a range such as a->c in which all elements in a range should be present (i.e) so it ends up like abc ? I know regex has [a-c] however that means any element between a to c must be present. What I want is that all elements between a range should be present. Is there an expression for that ?
n[abc]
Will capture only n and one of the character class.To capture more you need a quantifier like * or +.
So it will be
n[a-c]+ #will capture `n` and at least one of the character class
or
n[a-c]* #will capture `n` and `0` or more of character class
See demo.
or
If you want all of abc should be present you can use lookahead.
(?=.*a)(?=.*b)(?=.*c)n[abc]+
See demo.
https://regex101.com/r/pT4tM5/13

ReGex - matching a logical implication

Given the alphabet {a, b, c}, how can I create a simple regular expression which matches exactly those words which meet the following criteria:
If the string "aa" occurs, then consequently "cc" must also occur (Note the logical implication).
The order of occurence doesn't matter ("cc" as well as "aa" can occur first).
Due to the former logical implication (if-then relationship), the string "cc" can occur even without "aa", but not vice versa.
I am looking for a way to implement this by using these syntax elements (., *, +, ?, |, ) as well as brackets.
Example what should be matched:
cc
abba
bccb
bccaa
ε (epsilon - empty string)
What shouldn't be matched:
aa
aacbcb
abaaba
baaa
caac
I have tried the following: a?b?(ba)*(ccaa)*(aacc)*c*b?
Not sure if I should answer this, since it's homework, but here goes...
First of, you identify the clauses P:"string 'aa' occurs" and Q:"string 'cc' occurs". Then you transform P -> Q into !P v Q. Then you translate this into the regular expression formed of the two parts:
^(((b|c)*(a(b|c)+)*a?)|(.*cc.*))$
The first part denies any 'aa' groups, and goes like this: allow any number of bs and cs at the beginning (including none), then if you find an a, force at least one different character after it. as followed by non as may happen any number of times. At the end we also have the a? to allow for the string to end in an a and we are sure it has no a before it because none of the 2 preceding groups end in a.
The second part is trivial.
Test it here: http://rubular.com/r/mAPHO8bulo
See it here: http://jex.im/regulex/...

Complicated Regular Expression

Hello I am trying to figure out a regular expression for the following where the alphabet consists of 'a','b','c','d':
Any combination of 'a','b','c','d' is acceptable as long as 'd' is never followed by ('d'|'a') and 'b' is never followed by ('b'|'c'). Any help would be great! Thanks.
EDIT: The one that got me closest is (a|b?|c|d?)* but this does not account for the fact that a 'd' can not be followed by an 'a' and a 'b' can not be followed by a 'c'.
Break the problem down into its component parts:
Any combination of 'a','b','c','d' is acceptable
This would be the simple expression:
[abcd]
However, given the extra restrictions on the characters d and b this becomes:
[ac]
d not followed by d or a
This can be achieved with a simple negative lookahead:
d(?![da])
b not followed by b or c
This is only slightly different than the previous character match:
b(?![bc])
Adding it all together
The complete one-character regex therefore becomes:
[ac]|d(?![da])|b(?![bc])
Or as a full expression:
/^([ac]|d(?![da])|b(?![bc]))+$/

Grammar - RegEx - containing five vowels (aeiou)

I am trying to learn regular expression. I have
L = {a, b, x, y, z, i, o, u, e, c}
I want to construct a regular expression that describes a strings that contain the five vowels in alphabetical order (aeiou). All strings will have at least one of all five vowels.
Do I have to lay them out in order as they are in the set? like
a(b*x*y*z*i*o*u*ec*)iou
or can I mix them up like:
aeiou(b*x*y*z*c*)
Since, they are not in order in the set, does that mean the first solution is what I am looking for?
In most regex languages, you'll need something like:
[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*
That much is essentially uniform. You then have to deal with 'start of word' and 'end of word' issues, which depend on the context and the regex language. With one word per line, you can simply use '^' to start and '$'.
Using your preferred notation and knowing that the complete alphabet used consists of the 10 letters, and assuming you can do grouping, then you can write:
(b*c*x*y*z*)*a(b*c*x*y*z*)*e(b*c*x*y*z*)*i(b*c*x*y*z*)*i(b*c*x*y*z*)*u(b*c*x*y*z*)*
The (b*c*x*y*z*)* part says zero or more repeats of "zero or more b's followed by zero or more c's, ..., followed by zero or more z's". This does what you require; but it also demonstrates why character class notation is such a good idea.