Regular expression for strings containing a's in multiples of 3 - regex

Can someone please tell me how to generate a regular expression for strings in which number of a's is a multiple of 3? The alphabet set is {a,b}.
I tried to construct a DFA for it first and then derive a RE from that. What I got was ((ba*)(ba*)(ba*))*.

I would write the pattern as:
^b*(?:(?:ab*){3})*$
This matches:
^ from the start of the string
b* optional leading b zero or more times
(?:
(?:ab*){3} match a followed by zero or more b 3 times
)* match this group zero or more times
$ end of the string
Demo

Related

Regex Match a non-empty sequence of characters, where there is even amount of a given character (including 0) and any amount of the other characters

I am trying to write a regular expression that matches a non-empty sequence of A's and B's, where the A's are even which includes 0.
For example:
AABBABA -> AABBABA
BBBB -> BBBB
A -> nothing
Here is what I could come up with so far:
(AA+B*|B*AB*A|B*)+
But currently it is of course only gonna match what's in the parentheses not just any pattern of A's and B's. I am having trouble generalizing getting even amount of A's.
If you have to use regex, you may use something like this:
^(?:B*(?:AB*A)*B*)*$
Demo.
I'm sure this isn't the most efficient way but it seems to do the job.
This will basically match Two A characters with zero or more B characters in between and the whole thing is repeated zero or more times. This guarantees that the A count will be even. Then we have zero or more B characters at the beginning and end in case the string starts with or ends with B. And then the whole thing is repeated zero or more times again.
If you want to reject empty strings (and assuming your regex flavor supports Lookaheads), you can add a simple Lookahead that looks for one character to the beginning of the pattern:
^(?=.)(?:B*(?:AB*A)*B*)*$
If you don't want to match an empty string without lookarounds, you might also use
^(?:(?:B*AB*A)+B*|B+)$
Explanation
^ Start of string
(?: Non capture group
(?: Non capture group
(?:B*AB*A)+B* Match 1+ times pairs of A's between optional B's
| Or
B+ Match 1+ occurrences of B
) Close group
$ End of string
Regex demo

How to use regular expression to use as few groups as possible to match as long string as possible

For example, this is the regular expression
([a]{2,3})
This is the string
aaaa // 1 match "(aaa)a" but I want "(aa)(aa)"
aaaaa // 2 match "(aaa)(aa)"
aaaaaa // 2 match "(aaa)(aaa)"
However, if I change the regular expression
([a]{2,3}?)
Then the results are
aaaa // 2 match "(aa)(aa)"
aaaaa // 2 match "(aa)(aa)a" but I want "(aaa)(aa)"
aaaaaa // 3 match "(aa)(aa)(aa)" but I want "(aaa)(aaa)"
My question is that is it possible to use as few groups as possible to match as long string as possible?
How about something like this:
(a{3}(?!a(?:[^a]|$))|a{2})
This looks for either the character a three times (not followed by a single a and a different character) or the character a two times.
Breakdown:
( # Start of the capturing group.
a{3} # Matches the character 'a' exactly three times.
(?! # Start of a negative Lookahead.
a # Matches the character 'a' literally.
(?: # Start of the non-capturing group.
[^a] # Matches any character except for 'a'.
| # Alternation (OR).
$ # Asserts position at the end of the line/string.
) # End of the non-capturing group.
) # End of the negative Lookahead.
| # Alternation (OR).
a{2} # Matches the character 'a' exactly two times.
) # End of the capturing group.
Here's a demo.
Note that if you don't need the capturing group, you can actually use the whole match instead by converting the capturing group into a non-capturing one:
(?:a{3}(?!a(?:[^a]|$))|a{2})
Which would look like this.
Try this Regex:
^(?:(a{3})*|(a{2,3})*)$
Click for Demo
Explanation:
^ - asserts the start of the line
(?:(a{3})*|(a{2,3})*) - a non-capturing group containing 2 sub-sequences separated by OR operator
(a{3})* - The first subsequence tries to match 3 occurrences of a. The * at the end allows this subsequence to match 0 or 3 or 6 or 9.... occurrences of a before the end of the line
| - OR
(a{2,3})* - matches 2 to 3 occurrences of a, as many as possible. The * at the end would repeat it 0+ times before the end of the line
-$ - asserts the end of the line
Try this short regex:
a{2,3}(?!a([^a]|$))
Demo
How it's made:
I started with this simple regex: a{2}a?. It looks for 2 consecutive a's that may be followed by another a. If the 2 a's are followed by another a, it matches all three a's.
This worked for most cases:
However, it failed in cases like:
So now, I knew I had to modify my regex in such a way that it would match the third a only if the third a is not followed by a([^a]|$). So now, my regex looked like a{2}a?(?!a([^a]|$)), and it worked for all cases. Then I just simplified it to a{2,3}(?!a([^a]|$)).
That's it.
EDIT
If you want the capturing behavior, then add parenthesis around the regex, like:
(a{2,3}(?!a([^a]|$)))

How can i write a regular expression for to match string staring with alphabets and ending with digits

i want to match the strings which is listed below other than than that whatever the string is it should not match
rahul2803
albert1212
ra456
r1
only the above mentioned strings should match in the following group of data
rahul
2546rahul
456
rahul2803
albert1212
ra456
r1
rahulrenjan
r4ghyk
i tried with ([a-z]*[0-9]) but it's not working.
In regular expressions * means zero or more so your regex matches zero letters. If you want one or more use + (\d means digit).
^[a-zA-Z]+\d+$
Regular expressions are fun to solve once you get the hang of the syntax.
This one should be pretty straight:
Start with a letter. ^[a-z] (I am not taking the case of capital
letters here, if they are then ^[a-zA-Z] )
Have multiple letters/digits in between .*
End the string with a digit [0-9]$
Combine all 3 and you get:
^[a-z].*[0-9]$

How can I create a regex that matches a 6[A-Z] character string, with 3 or more non-consecutive or consecutive Cs?

(?=[A-Z]{6})(?=([C]){3,6}) is what I have tried so far.
I would like it to work like this:
ABYCCC Match
CBTCAC Match
CCTYEC Match
AFEQCB Don't match
CCEEEE Don't match
EEEEEE Don't match
This however just matches strings with consecutive Cs.
I am very new so any help is appreciated. I'm just using the search in Notepad ++
^(?=(?:.*C){3}).*$
Use this regex.See demo.
https://regex101.com/r/rP5pV8/1
So here we go
\b(?=(?:[ABD-Z]*C){3})[A-Z]{6}\b
This will match any string that contains of 6 Uppercase letters, of whom 3 (or more) are Cs.
It doesn't match:
strings shorter than 6 uppercase letters
strings longer than 6 uppercase letters
strings with less than 3 C but following Cs outside the string
https://regex101.com/r/vV3yS4/2
You can check for occurence of at least 3 C by using a lookahead.
^(?=(?:[^C]*C){3})[A-Z]{6}$
[^C]*C matches any amount of characters, that are not C followed by C
the (?:...) non capture group {3} to be repeated 3 times
[A-Z]{6} requires 6 upper alphas.
See demo at regex101
(Note that I put for demo an addional \n in the negated class for not skipping newlines)
Here's how I would do it in Python:
import re
pattern = re.compile("[A-Z]{6}")
strings = ["AABSDC", "CCCASD", "CAVACC"]
def checkC(letters):
return pattern.match(letters) and letters.count('C') >= 3
for string in strings:
print(checkC(string))
Output:
False
True
True
(?=(.*C.*){3})[A-Z]{6}
I swapped the two parts and removed the "lookahead" from the [A-Z]{6} so that the expression matches something positive.
Then, to the left and right of "C" I added lazy dots that match anything zero or more times. So you still match three or more Cs and allow for anything between them.
After that, I removed the ,6 because "anything" can be some more Cs.

What does the RegEx (a+b)^n(c+d)^m match?

I'm unsure of what this RegEx matches:
(a+b)^n(c+d)^m
I know that the + metacharacter means "one or more times the preceding pattern". So, a+ would match one or more as whereas a* also includes the empty string.
But I think that in this case, the RegEx means a or b to the nth time concatenated with c or d to the mth time, so it'd match strings like these:
aaaacc (n=4, m=2)
bbbbbdddd (n=5, m=4)
aaaddddd (n=3, m=5)
bc (n=1, m=1)
aaaaaaaaaaaaccccc (n=12, m=5)
...
Is this correct? If it's not, can anyone provide examples of what this RegEx does match?
It doesn't look like a valid regular expression given the incorrect use of ^
^ should either be inside []'s like this [^a], or at the very start of the regular expression.
+ just means 1 or more occurrence of a character.
If ^n means can be repeated n times then these would be matches:
aaaaaabccccccccd,
aaaaaabaaaaaabaaaaaabccccccccdccccccccd
Apparently (a+b)^n(c+d)^m means "n slots for unordered a's and b's followed by m slots for unordered c's and d's"
e.g. an example of (a+b)^10(c+d)^5 would be: aaaababbbadcccd
If you're using Perl regular expressions with the 'm' option, e.g. /(a+b)^n(c+d)^m/m, the
'^' will match an internal beginning of line. So...
/
(a+b) # Match one or more as followed by b
^n # Match the beginning of a line followed by a literal n.
(c+d) # Match one or more cs followed by d
^m # Match the beginning of a line followed by a literal m.
/mx
(a+b) and (c+d) would be available in $1 and $2.