this is a two-part question, but I feel the answers will be related.
I have this regex pattern:
(\d+)(aa|bb) which I use to capture this string: 1bb2aa3aa4bb5bb6aa7bb8cc9cc
See demo: example 1
The way it captures the random series of aa and bb (both preceded by a digit) is exactly what I want, and is good as far as it goes.
So we get this match on regex101:
Match 1
Full match 0-3 `1bb`
Group 1. 0-1 `1`
Group 2. 1-3 `bb`
Match 2
Full match 3-6 `2aa`
Group 1. 3-4 `2`
Group 2. 4-6 `aa`
Match 3
Full match 6-9 `3aa`
Group 1. 6-7 `3`
Group 2. 7-9 `aa`
Match 4
Full match 9-12 `4bb`
Group 1. 9-10 `4`
Group 2. 10-12 `bb`
Match 5
Full match 12-15 `5bb`
Group 1. 12-13 `5`
Group 2. 13-15 `bb`
Match 6
Full match 15-18 `6aa`
Group 1. 15-16 `6`
Group 2. 16-18 `aa`
Match 7
Full match 18-21 `7bb`
Group 1. 18-19 `7`
Group 2. 19-21 `bb`
As expected, the 8cc9ccbit at the end is ignored. I would like capture this as well, in the same way I have captured the first repeating groups, in the same expression. So in the final output, I'd get something like this added to the end of the output. This should work for any amounts of matches on either side. This text is just one example.
Full match 21-24 `8cc`
Group 1. 21-22 `8`
Group 2. 22-24 `cc`
Match 7
Full match 24-27 `9cc`
Group 1. 24-25 `9`
Group 2. 25-27 `cc`
Also, I'd like to do similar but flipping the 'or' group to the end i.e. this:
1cc2cc3cc4cc5cc6cc7ccb8aa9bb
My current regex pattern (\\d+)(cc) only matches the repeating 'cc' groups.
See demo: example 2
I would like a similar full capture, with any amount of permissible entries of each group.
Any thoughts?
You may use
(?:\G(?!^)(?(?=\d+(?:aa|bb))(?<!\dcc))|(?=(?:\d+(?:aa|bb))+(?:\d+cc)+))(\d+)(aa|bb|cc)
See the regex demo
The regex will only match the string that meets the pattern in the (?=(?:\d+(?:aa|bb))+(?:\d+cc)+) lookahead, and then will consecutively match and capture digits and aa, bb or cc, but digits + aa or bb will be matched unless digits + cc is not in front.
Details
(?:\G(?!^)(?(?=\d+(?:aa|bb))(?<!\dcc))|(?=(?:\d+(?:aa|bb))+(?:\d+cc)+)) - either of the two alternatives:
\G(?!^) - end of the previous successful match
(?(?=\d+(?:aa|bb))(?<!\dcc)) - if-then-else construct: if there is 1+ digits and aa or bb immediately to the right of the current location ((?=\d+(?:aa|bb)), then only continue matching if there is no digit followed with cc immediately to the left of the current location ((?<!\dcc))
| - or
^ - start of string
(?=(?:\d+(?:aa|bb))+(?:\d+cc)+) - a positive lookahead that, immediately to the right of the current location, searches for the following (and returns true if it finds the patterns, or false if it does not):
(?:\d+(?:aa|bb))+ - one or more occurrences of 1+ digits followed with aa or bb
(?:\d+cc)+ - one or more occurrences of 1+ digits followed with cc
(\d+) - Group 1: one or more digits
(aa|bb|cc) - aa, bb or cc.
For the second pattern, replace cc with (?:aa|bb):
(?:\G(?!^)(?(?=\d+cc)(?<!\d(?:aa|bb)))|(?=(?:\d+cc)+(?:\d+(?:aa|bb))+))(\d+)(aa|bb|cc)
I'm no expert with perl, so I'll give a bit of pseudo code here. Feel free to suggest an edit.
You can start by matching any number of xaa or xbb combos, followed by one or more xcc combos using this pattern: ^(?:\d+(?:aa|bb))+(?:\dcc)+$
Once you have that you can use this pattern to capture the appropriate groups: (\d+)(aa|bb|cc)
Demo 1
Demo 2
Something like:
if(ismatch("^(?:\d+(?:aa|bb))+(?:\dcc)+$", inputString))
{
match = match("(\d+)(aa|bb|cc)", inputString);
}
from here you can extract the information using the groups.
Related
I am new bee to regex, I have an example string : account-device-v2-2-3-63-21900
and using this regular expression [1-9]-[0-9]-[0-9]*
I am getting output as 1-2-3
but my intention is to match/extract pattern 2-3-63
Meaning to get digits with hyphens after v2 (or v1 etc), I don't need last digit part (21000 or any other number)
Any suggestions please?
You want to get 1 or more digit except 0, dash, 1 or more digit, dash, 1 or more digit from account-device-v2-2-3-63-21900 or account-device-v1-2-3-63-21900?
Use v[12]-([1-9]+?-[0-9]+?-[0-9]+?)- and get first group.
Demo: https://regex101.com/r/hMLGsK/1
The pattern [1-9]-[0-9]-[0-9]* matches 2-2-3 because your pattern does not match the v and a digit part and this is the first part it can match.
Note that [0-9]* Matches optional digits, so 2-2- could also be a match.
Using a capture group to get the value:
\bv[1-9][0-9]*-([1-9][0-9]*-[0-9]+-[0-9]+)
\bv[1-9][0-9]*- Match v1 or also possibly v20 etc..
( Capture group 1
[1-9][0-9]* Match a digit starting at 1
-[0-9]+-[0-9]+ 2 parts matching - and 1 or more digits starting from 0
) Close group 1
Regex demo
I would like to improve a VBA regex for a print range.
Currently I have this:
(\d+(-\d+)*)+(,\d+(-\d+)*)*
But, for an entry 12-25,45,50-53 this is returning the , and - like this:
Match 1: -25
Match 2: ,50-53
Match 3: -53
and is not returning the 45
Ideally I'd like a group returned for each comma delimited entry without any , or - like this:
Match 1: (12-25)
Match 2: (45)
Match 3: (50-53)
The reason 45 is not in a group is that you are repeating the second capturing group. When you are repeating a capturing group, the group contains the value of the last iteration.
So (,\d+(-\d+)*) will capture ,45. Now the whole group is repeated due to the outer * and within that last iteration ,50 is captured by ,\d+ and -53 is captured by -\d+
What you might do is match 1+ digits and use a single optional group for the hyphen and 1+ digits part to get 3 matches.
Use a positive lookahead (?=,|$) to assert what is directly on the right is a comma or the end of the string.
\d+(?:-\d+)?(?=,|$)
Regex demo
If you want 3 groups, you could use:
(\d+(?:-\d+)?),(\d+(?:-\d+)?),(\d+(?:-\d+)?)
Regex demo
I'm trying to match an exact pattern to do some data cleanup for ISSN's using the code below:
select case when REGEXP_REPLACE('1234-5678 ÿþT(zlsd?k+j''fh{l}x[a]j).,~!##$%^&*()_+{}|:<>?`"\;''/-', '([0-9]{4}[\-]?[Xx0-9]{4})(.*)', '$1') not similar to '[0-9]{4}[\-]?[Xx0-9]{4}' then 'NOT' else 'YES' end
The pattern I want match any 8 digit group with a possible dash in the middle and possible X at the end.
The code above works for most cases, but if capture group 1 is the following example: 123456789 then it also returns positive because it matches the first 8 digits, and I don't want it to.
I tried surrounding capture group 1 with ^...$ but that doesn't work either.
So I would like to match exactly these examples and similar ones:
1234-5678
1234-567X
12345678
1234567X
BUT NOT THESE (and similar):
1234567899
1234567899x
What am I missing?
You may use
^([0-9]{4}-?[Xx0-9]{4})([^0-9].*)?$
See the regex demo
Details
^ - start of string
([0-9]{4}-?[Xx0-9]{4}) - Capturing group 1 ($1): four digits, an optional -, and then four x / X or digits
([^0-9].*)? - an optional Capturing group 2: any char other than a digit and then any 0+ chars as many as possible
$ - end of string.
For example, this is the regular expression
([a]{2,3})
This is the string
aaaa // 1 match "(aaa)a" but I want "(aa)(aa)"
aaaaa // 2 match "(aaa)(aa)"
aaaaaa // 2 match "(aaa)(aaa)"
However, if I change the regular expression
([a]{2,3}?)
Then the results are
aaaa // 2 match "(aa)(aa)"
aaaaa // 2 match "(aa)(aa)a" but I want "(aaa)(aa)"
aaaaaa // 3 match "(aa)(aa)(aa)" but I want "(aaa)(aaa)"
My question is that is it possible to use as few groups as possible to match as long string as possible?
How about something like this:
(a{3}(?!a(?:[^a]|$))|a{2})
This looks for either the character a three times (not followed by a single a and a different character) or the character a two times.
Breakdown:
( # Start of the capturing group.
a{3} # Matches the character 'a' exactly three times.
(?! # Start of a negative Lookahead.
a # Matches the character 'a' literally.
(?: # Start of the non-capturing group.
[^a] # Matches any character except for 'a'.
| # Alternation (OR).
$ # Asserts position at the end of the line/string.
) # End of the non-capturing group.
) # End of the negative Lookahead.
| # Alternation (OR).
a{2} # Matches the character 'a' exactly two times.
) # End of the capturing group.
Here's a demo.
Note that if you don't need the capturing group, you can actually use the whole match instead by converting the capturing group into a non-capturing one:
(?:a{3}(?!a(?:[^a]|$))|a{2})
Which would look like this.
Try this Regex:
^(?:(a{3})*|(a{2,3})*)$
Click for Demo
Explanation:
^ - asserts the start of the line
(?:(a{3})*|(a{2,3})*) - a non-capturing group containing 2 sub-sequences separated by OR operator
(a{3})* - The first subsequence tries to match 3 occurrences of a. The * at the end allows this subsequence to match 0 or 3 or 6 or 9.... occurrences of a before the end of the line
| - OR
(a{2,3})* - matches 2 to 3 occurrences of a, as many as possible. The * at the end would repeat it 0+ times before the end of the line
-$ - asserts the end of the line
Try this short regex:
a{2,3}(?!a([^a]|$))
Demo
How it's made:
I started with this simple regex: a{2}a?. It looks for 2 consecutive a's that may be followed by another a. If the 2 a's are followed by another a, it matches all three a's.
This worked for most cases:
However, it failed in cases like:
So now, I knew I had to modify my regex in such a way that it would match the third a only if the third a is not followed by a([^a]|$). So now, my regex looked like a{2}a?(?!a([^a]|$)), and it worked for all cases. Then I just simplified it to a{2,3}(?!a([^a]|$)).
That's it.
EDIT
If you want the capturing behavior, then add parenthesis around the regex, like:
(a{2,3}(?!a([^a]|$)))
I'm looking for a regex to match following 'good' values.
100.100.100.10
100.100.100.1
100.100.100
100.100.10
100.100.1
100.100
Following conditions need to be valid:
Minimum of 7 characters (xxx.xxx)
Maximum of 14 characters (xxx.xxx.xxx.xx)
Groups can contain max 3 characters (xxx)
Groups need to be separated by a '.'
In case of 3 groups, the third group can contain 1 to 3 characters (x or xx or xxx)
In case of 4 groups, the fourth group can contain 1 to 2 characters (x or xx)
All previous groups need to contain 3 characters (xxx)
To test the validity of a string value, following regex has been created.
([0-9]{3}(\.[0-9]{3}){2}(\.[0-9]{1,2}))|
([0-9]{3}(\.[0-9]{3})(\.[0-9]{1,3})?)
I had to use the OR operator, but can't find how I can exclude values containing more than 14 characters. I've tested the below 'bad' examples (via http://regexr.com/) and do get a match on PART of the string. However, my rule would have to be NOT to accept these strings, since there's 'noise' at the end of each 'word' (whereas the word is considered a string without spaces)
100.100.100.100.100
The last .100 needs to make the full string invalid, no partial match is accepted. Adding \b or ^$ in combination with the OR does not provide me the required result.
100.100.100.100100
100.100.100100100
100.100.100.100
The above need to invalid as well.
You may use optional groups:
^[0-9]{3}\.[0-9]{3}(?:\.[0-9]{1,3}(?:\.[0-9]{1,2})?)?$
See the regex demo
Details:
^ - start of string
-[0-9]{3} - 3 digits (your group 1)
\.[0-9]{3} - a dot and 3 digits (group 2)
(?:\.[0-9]{1,3}(?:\.[0-9]{1,2})?)? - an optional group matching
\.[0-9]{1,3} - a dot and 1 to 3 digits (Group 3)
(?:\.[0-9]{1,2})? - an optional group (group 4):
\. - a dot
[0-9]{1,2} - any 1 to 2 digits
$ - end of string
Logically, there's 1-3 complete groups, followed by a partial group:
^(?=.{7,14})(\d{3}\.){1,3}\d{1,3}$
The look ahead enforces the length.
See live demo.