regex to match two different groups with the same length - regex

I would like to construct a regex that matches two groups, with the second group consisting of a single character repeated the same number of times as the number of characters in the first group. Something like ^(\w+) (x{length of \1}) so, for example, hello xxxxx and foo xxx would match, but hello xxxxy and foo xxy would not. Is this possible?
The goal here is to match indentation in reStructuredText-style lists, where the second line in a list item should be indented to match the start of the text in the first line, excluding the variable-length numerical list marker. For example,
1. If the number is small,
subsequent lines are typically
indented three spaces.
2. Although if the first line has
multiple leading spaces then
subsequent lines should reflect that.
11. And when the number starts to get
bigger the indent will necessarily
be bigger too.

You can do it if
your regex engine supports conditional patterns and
you're willing to accept a fixed upper bound on the number of repetitions.
In that case you can do something like this:
^(\w)?(\w)?(\w)?(\w)?(\w)? (?(1)x)(?(2)x)(?(3)x)(?(4)x)(?(5)x)
This example will match up to a length of 5.

Related

Regex needed to match individual values from comma separated list

I need a regex to match and extract all the values from a comma separated list.
The maximum size of the list is always the same.
For example the if max size is 3 the following lists can exist:
VALUE1,VALUE2,VALUE3
VALUE1,VALUE2
VALUE1
I need, if possible, a regex to extract in capturing groups the elements above, no matter of what list is given as input.
I have tried with something simple like:
(.*)(,?)(.*)(,?)(.*)
But it matches the whole thing, no values are extracted. I don't understand why the ? doesn't work correctly in this case.
What I need: to apply the same regex for all the lists and extract the values.
Given the regex is used (.*)(,?)(.*)(,?)(.*)
Given the input list is VALUE1,VALUE2,VALUE3
Then I expect that group1=VALUE1, group3=VALUE2, group5=VALUE3
Given the regex is used (.*)(,?)(.*)(,?)(.*)
Given the input list is VALUE1,VALUE2
Then I expect that group1=VALUE1, group3=VALUE2
Given the regex is used (.*)(,?)(.*)(,?)(.*)
Given the input list is VALUE1
Then I expect that group1=VALUE1
You can make some of the groups optional, and simplify slightly by avoiding parentheses where they are unnecessary. You should also make your regex unambiguous; .* can match a comma, and the regex engine will do that if it needs to do that in order to find a match. You will also want to add anchors to the expression to avoid matching a substring of a longer line.
^([^,]*)(,([^,]*)(,([^,]*))?)?$
Demo: https://regex101.com/r/swUn3B/2
(where I had to add \n to the character class [^,\n] to avoid straddling newlines in the test data).
The fundamental problem with your attempt is that ,? is allowed to match nothing, and so the regex engine will do that if it's needed to achieve a match. The trick in this solution is to only make the entire group optional: if there is no comma, that's fine; but if there is a comma, it needs to be followed by another group of non-comma characters. We repeat this as many times as necessary to capture the specified maximum number of non-comma groups.

Regex to match text and a specific number

I am looking for a regex which can match the following conditions.
It always starts with "someId":[ and ends with ].
It must contain the number 25 within the square brackets.
There may be numbers before and after number 25
The numbers are separated with a comma (,) apart from the last number
For example:
"someId":[25]
"someId":[25,27]
"someId":[1,4,25]
"someId":[1,4,25,27,30]
I have the following regex which works, however I was wondering if theres a better way to do it which isn't as greedy.
"someId":\[(\d{1,2},)*?25,?(\d{1,2},)*?(\d{1,2})?]
a bit simplified:
"someId":\[(\d+,)*25(,\d+)*\]

Joining 2 regex rules into one

Is there a special character to join to groups of rules in regex
I need to match the first 2 chars and the last 2 number in every row
This match the first 2 chars
(^..)
this match the last 2 numbers
([0-9][0-9]$)
How to join those 2 rules?
Tried that withou success
(^..)([0-9][0-9]$)
Well you need to match the parts in between as well. Just allow for arbitrarily many arbitrary characters:
(^..).*([0-9][0-9]$)
Note that in most flavors . does not match line breaks. If your input may contain line breaks, use the s ("single line" or sometimes "dotall") modifier, to change .s meaning. Otherwise (i.e. in JavaScript) use [\s\S]*.
Also note that it might be easier, more readable and more efficient to just use two regexes consecutively:
^..
[0-9][0-9]$
No need for grouping/capturing and repetition.
EDIT:
Note that these two aren't completely equivalent. The first one requires at least four characters (because the two characters matched by .. cannot be matched again by [0-9][0-9]) while the second one could just contain two digits (in which case the .. would match those same digits). It depends on which of these semantics you are looking for. A third solution that uses only one regex but is equivalent to the two-regex solution would use lookaheads:
^(?=(..))(?=.*([0-9][0-9])$)
This would allow you to match x12, the first capture being x1 and the second being 12.
Thanks for Alan Moore for pointing this out.
You need to add anything goes here - also known as .*
(^..).*([0-9][0-9]$)
(^..).*([0-9][0-9]$)
You can use the .* modifer to match 'everything in between'
If the row contains additional characters between the "first two" and the "last two", then you'll need something in the regex to match the intervening characters; something like:
(^..).*([0-9][0-9]$)

Regexp: How to match a string that doesn't have any character repeated 3 times?

I'm trying to make a single pattern that will validate an input string. The validation rule does not allow any character to be repeated more that 3 times in a row.
For example:
Aabcddee - is valid.
Aabcddde - is not valid, because of 3 d chracters.
The goal is to provide a RegExp pattern that could match one of above examples, but not both. I know I could use back-references such as ([a-z])\1{1,2} but this matches only sequential characters. My problem is that I cannot figure out how to make a single pattern for that. I tried this, but I don't quite get why it isn't working:
^(([a-z])\1{1,2})+$
Here I try to match any character that is repeated 1 or 2 times in the internal group, then I match that internal group if it's repeated multiple times. But it's not working that way.
Thanks.
To check that the string does not have a character (of any kind, even new line) repeated 3 times or more in a row:
/^(?!.*(.)\1{2})/s
You can also check that the input string does NOT have any match to this regex. In this case, you can also know the character being repeated 3 times or more in a row. Notice that this is exactly the same as above, except that the regex inside the negative look-ahead (?!pattern) is taken out.
/^.*(.)\1{2}/s
If you want to add validation that the string only contains characters from [a-z], and you consider aaA to be invalid:
/^(?!.*(.)\1{2})[a-z]+$/i
As you can see i flag (case-insensitive) affect how the text captured is compared against the current input.
Change + to * if you want to allow empty string to pass.
If you want to consider aaA to be valid, and you want to allow both upper and lower case:
/^(?!.*(.)\1{2})[A-Za-z]+$/
At first look, it might seem to be the same as the previous one, but since there is no i flag, the text captured will not subject to case insensitive matching.
Below is failed answer, you can ignore it, but you can read it for fun.
You can use this regex to check that the string does not have 3 repeated character (of any kind, even new line).
/^(?!.*(.)(?:.*\1){2})/s
You can also check that the input string does NOT have any match to this regex. In this case, you can also know the character being repeated more than or equal to 3 times. Notice that this is exactly the same as above, except that the regex inside the negative look-ahead (?!pattern) is taken out.
/^.*(.)(?:.*\1){2}/s
If you want to add validation that the string only contains characters from [a-z], and you consider aaA to be invalid:
/^(?!.*(.)(?:.*\1){2})[a-z]+$/i
As you can see i flag (case-insensitive) affect how the text captured is compared against the current input.
If you want to consider aaA to be valid, and you want to allow both upper and lower case:
/^(?!.*(.)(?:.*\1){2})[A-Za-z]+$/
At first look, it might seem to be the same as the previous one, but since there is no i flag, the text captured will not subject to case insensitive matching.
From your question I get that you want to match
only strings consisting of chars from [A-Za-z] AND
only strings which have no sequence of the same character with a length of 3 or more
Then this regexp should work:
^(?:([A-Za-z])(?:(?!\1)|\1(?!\1)))+$
(Example in perl)

Using regex to find arbitrary length consecutive blocks

I have a string containing ones and zeroes. I want to determine if there are substrings of 1 or more characters that are repeated at least 3 consecutive times. For example, the string '000' has a length 1 substring consisting of a single zero character that is repeated 3 times. The string '010010010011' actually has 3 such substrings that each are repeated 3 times ('010', '001', and '100').
Is there a regex expression that can find these repeating patterns without knowing either the specific pattern or the pattern's length? I don't care what the pattern is nor what its length is, only that the string contains a 3-peat pattern.
Here's something that might work, however, it will only tell you if there is a pattern repeated three times, and (I don't think) can't be extended to tell you if there are others:
/(.+).*?\1.*?\1/
Breaking that out:
(.+) matches any 1 or more characters, starting anywhere in the string
.*? allows any length of interposing other characters (0 or more)
\1 matches whatever was captured by the (...+) parentheses
.*? 0 or more of anything
\1 the original pattern, again
If you want the repetitions to occur immediately adjacent, then instead use
/(.+)\1\1/
… as suggested by #Buh Buh — the \1 vs. $1 notation may vary, depending on your regexp system.
(.+)\1\1
The \ might be a different charactor depending on your language choice. This means match any string then try to match it again twice more.
The \1 means repeat the 1st match.
it looks weird, but this could be the solution:
/000000000|100100100|010010010|001001001|110110110|011011011|101101101|111111111/
This contains all possible combinations for three times. So your regular expression will match for these numbers (i.e.):
10010010011
00010010011
10110110110
But not for these:
101010101010
001110111110
111000111000
And it doesn't matter where the sequence appears in the whole string.