Using regex to find arbitrary length consecutive blocks

Using regex to find arbitrary length consecutive blocks - regex

I have a string containing ones and zeroes. I want to determine if there are substrings of 1 or more characters that are repeated at least 3 consecutive times. For example, the string '000' has a length 1 substring consisting of a single zero character that is repeated 3 times. The string '010010010011' actually has 3 such substrings that each are repeated 3 times ('010', '001', and '100').
Is there a regex expression that can find these repeating patterns without knowing either the specific pattern or the pattern's length? I don't care what the pattern is nor what its length is, only that the string contains a 3-peat pattern.

Here's something that might work, however, it will only tell you if there is a pattern repeated three times, and (I don't think) can't be extended to tell you if there are others:
/(.+).*?\1.*?\1/
Breaking that out:
(.+) matches any 1 or more characters, starting anywhere in the string
.*? allows any length of interposing other characters (0 or more)
\1 matches whatever was captured by the (...+) parentheses
.*? 0 or more of anything
\1 the original pattern, again
If you want the repetitions to occur immediately adjacent, then instead use
/(.+)\1\1/
… as suggested by #Buh Buh — the \1 vs. $1 notation may vary, depending on your regexp system.

(.+)\1\1
The \ might be a different charactor depending on your language choice. This means match any string then try to match it again twice more.
The \1 means repeat the 1st match.

it looks weird, but this could be the solution:
/000000000|100100100|010010010|001001001|110110110|011011011|101101101|111111111/
This contains all possible combinations for three times. So your regular expression will match for these numbers (i.e.):
10010010011
00010010011
10110110110
But not for these:
101010101010
001110111110
111000111000
And it doesn't matter where the sequence appears in the whole string.

Related

Positive and Negative Lookahead on matchings strings with two or more same consecutive characters [duplicate]

I can very easily write a regular expression to match a string that contains 2 consecutive repeated characters:
/(\w)\1/
How do I do the complement of that? I want to match strings that don't have 2 consecutive repeated characters. I've tried variations of the following without success:
/(\w)[^\1]/ ;doesn't work as hoped
/(?!(\w)\1)/ ;looks ahead, but some portion of the string will match
/(\w)(?!\1)/ ;again, some portion of the string will match
I don't want any language/platform specific way to take the negation of a regular expression. I want the straightforward way to do this.

The below regex would match the strings which don't have any repeated characters.
^(?!.*(\w)\1).*
(?!.*(\w)\1) negative lookahead which asserts that the string going to be matched won't contain any repeated characters. .*(\w)\1 will match the string which has repeated characters at the middle or at the start or at the end. ^(?!.*(\w)\1) matches all the starting boundaries except the one which has repeated characters. And the following .* matches all the characters exists on that particular line. Note this this matches empty strings also. If you don't want to match empty lines then change .* at the last to .+
Note that ^(?!(\w)\1) checks for the repeated characters only at the start of a string or line.
Lookahead and lookbehind, collectively called "lookaround", are zero-length assertions just like the start and end of line. They do not consume characters in the string, but only assert whether a match is possible or not. Lookaround allows you to create regular expressions that are impossible to create without them, or that would get very longwinded without them.

Check odd number of a certain character

For Uni, I need to write a method with a string as parameter which checks if the string has an even number of a's in it. Normally I had sequences like this:
baaaaaad which would then be easy to figured out by RegEx (.*)(aa)*(.*)
But now they look like this:
baadaafaag
And I have no clue how to do this since there are other characters seperating this.

Try this one for a simpler solution
^([^a]*(a{2})*[^a]*)*$
It checks for groups of 2 "a"s delimited by non-"a"s
bad no match
baad match
baaad no match
baaaad match
baaaaad no match
baaaaaad match
baadaafaag match
baadaaaaag no match

just use this [a-z]*aa+[a-z]*aa+[a-z]*
Here [a-z]* for zero or more character.aa+ for atleast 1 a followed by athat means aa.
The inner [a-z]* is for you may or may having have any number of character between every fair of aa.
Outer [a-z]* for you may have any number of character after aa.

Regular expression (regex): each character can appear at most as many as given

So far I have this regex ^(?!.*?(a|c|e|g|i).*?\1)[acegi]+$ which match any word as combination of the characters "acegi", and these characters can occur only once.
Now I'm trying to match any word which will consist of given characters and these characters can repeat as many times as given.
Example for set of given characters "acegii"
Valid matches: "acegii" "ace" "a" "i" "ai" "gii" "ici" "iic" "aicige" etc.
Invalid matches: "acegiii" "iacegii" "iii" "aa" "cc" etc.
Thanks for any help!
Note: the characters set in the regex should be easily replaceable if possible.
Prefered regexs: posix, ruby

You can use something similar to what you have, but with a second negative lookahead for the i:
^(?!.*?([aceg]).*?\1)(?!.*?i.*?i.*?i)[acegi]+$
Basically, one negative lookahead for each number of 'most' appearances.
rubular demo

Quantify your lookahead:
/^(?!.*?([acegi])(?:.*?\1){N})[acegi]+$/
Replace that N with the number of appearances that are allowed - for instance, {1} will allow a single one of each character. {2} will allow one or two occurrences. {3} allows up to three, and so on.
Keep in mind, though, that you are dangerously close to the path of catastrophic backtracking, which could well crash your script.
You may want to use string operations instead. In summary:
Match string against /^[acegi]+$/
Count number of occurrences of each character (ie. iterate through the string)
Get the maximum number of occurrences (could be a simple max() call if done right)
If that max is higher than your allowed limit, trigger failure.

Regexp: How to match a string that doesn't have any character repeated 3 times?

I'm trying to make a single pattern that will validate an input string. The validation rule does not allow any character to be repeated more that 3 times in a row.
For example:
Aabcddee - is valid.
Aabcddde - is not valid, because of 3 d chracters.
The goal is to provide a RegExp pattern that could match one of above examples, but not both. I know I could use back-references such as ([a-z])\1{1,2} but this matches only sequential characters. My problem is that I cannot figure out how to make a single pattern for that. I tried this, but I don't quite get why it isn't working:
^(([a-z])\1{1,2})+$
Here I try to match any character that is repeated 1 or 2 times in the internal group, then I match that internal group if it's repeated multiple times. But it's not working that way.
Thanks.

To check that the string does not have a character (of any kind, even new line) repeated 3 times or more in a row:
/^(?!.*(.)\1{2})/s
You can also check that the input string does NOT have any match to this regex. In this case, you can also know the character being repeated 3 times or more in a row. Notice that this is exactly the same as above, except that the regex inside the negative look-ahead (?!pattern) is taken out.
/^.*(.)\1{2}/s
If you want to add validation that the string only contains characters from [a-z], and you consider aaA to be invalid:
/^(?!.*(.)\1{2})[a-z]+$/i
As you can see i flag (case-insensitive) affect how the text captured is compared against the current input.
Change + to * if you want to allow empty string to pass.
If you want to consider aaA to be valid, and you want to allow both upper and lower case:
/^(?!.*(.)\1{2})[A-Za-z]+$/
At first look, it might seem to be the same as the previous one, but since there is no i flag, the text captured will not subject to case insensitive matching.
Below is failed answer, you can ignore it, but you can read it for fun.
You can use this regex to check that the string does not have 3 repeated character (of any kind, even new line).
/^(?!.*(.)(?:.*\1){2})/s
You can also check that the input string does NOT have any match to this regex. In this case, you can also know the character being repeated more than or equal to 3 times. Notice that this is exactly the same as above, except that the regex inside the negative look-ahead (?!pattern) is taken out.
/^.*(.)(?:.*\1){2}/s
If you want to add validation that the string only contains characters from [a-z], and you consider aaA to be invalid:
/^(?!.*(.)(?:.*\1){2})[a-z]+$/i
As you can see i flag (case-insensitive) affect how the text captured is compared against the current input.
If you want to consider aaA to be valid, and you want to allow both upper and lower case:
/^(?!.*(.)(?:.*\1){2})[A-Za-z]+$/
At first look, it might seem to be the same as the previous one, but since there is no i flag, the text captured will not subject to case insensitive matching.

From your question I get that you want to match
only strings consisting of chars from [A-Za-z] AND
only strings which have no sequence of the same character with a length of 3 or more
Then this regexp should work:
^(?:([A-Za-z])(?:(?!\1)|\1(?!\1)))+$
(Example in perl)

Match Regular Expressoin if string contains exactly N occrences of a character

I'd like a regular expression to match a string only if it contains a character that occurs a predefined number of times.
For example:
I want to match all strings that contain the character "_" 3 times;
So
"a_b_c_d" would pass
"a_b" would fail
"a_b_c_d_e" would fail
Does someone know a simple regular expression that would satisfy this?
Thank you

For your example, you could do:
\b[a-z]*(_[a-z]*){3}[a-z]*\b
(with an ignore case flag).
You can play with it here
It says "match 0 or more letters, followed by '_[a-z]*' exactly three times, followed by 0 or more letters". The \b means "word boundary", ie "match a whole word".
Since I've used '*' this will match if there are exactly three "_" in the word regardless of whether it appears at the start or end of the word - you can modify it otherwise.
Also, I've assumed you want to match all words in a string with exactly three "_" in it.
That means the string "a_b a_b_c_d" would say that "a_b_c_d" passed (but "a_b" fails).
If you mean that globally across the entire string you only want three "_" to appear, then use:
^[^_]*(_[^_]*){3}[^_]*$
This anchors the regex at the start of the string and goes to the end, making sure there are only three occurences of "_" in it.

Elaborating on Rado's answer, which is so far the most polyvalent but could be a pain to write if there are more occurrences to match :
^([^_]*_){3}[^_]*$
It will match entire strings (from the beginning ^ to the end $) in which there are exactly 3 ({3}) times the pattern consisting of 0 or more (*) times any character not being underscore ([^_]) and one underscore (_), the whole being followed by 0 ore more times any character other than underscore ([^_]*, again).
Of course one could alternatively group the other way round, as in our case the pattern is symmetric :
^[^_]*(_[^_]*){3}$

This should do it:
^[^_]*_[^_]*_[^_]*_[^_]*$

If you're examples are the only possibilities (like a_b_c_...), then the others are fine, but I wrote one that will handle some other possibilities. Such as:
a__b_adf
a_b_asfdasdfasfdasdfasf_asdfasfd
___
_a_b_b
Etc.
Here's my regex.
\b(_[^_]*|[^_]*_|_){3}\b

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using regex to find arbitrary length consecutive blocks - regex

(.+)\1\1 The \ might be a different charactor depending on your language choice. This means match any string then try to match it again twice more. The \1 means repeat the 1st match.

Related

Positive and Negative Lookahead on matchings strings with two or more same consecutive characters [duplicate]

Check odd number of a certain character

Regular expression (regex): each character can appear at most as many as given

Regexp: How to match a string that doesn't have any character repeated 3 times?

Match Regular Expressoin if string contains exactly N occrences of a character

Categories

Resources