Regex with Named Capture Group - regex

I am trying to update a regex pattern to include a Named Capture Group. Currently, this regex pattern:
\b\d(?!(?:\d{0,3}([-\/\\.])\d{1,2}\1\d{1,4})\b(?!\S))(?:[^\n\d\$\.\%]*\d){14}\b
correctly returns 4 matches from this sample text:
AAA
43 42 040 012 036 00
43 42 090 037 124 00
53 07 010 005 124 00
06-14 301-830-081-49
BBB
When I revised the pattern to add a Named Capture Group it only returns 3 matches and misses the last one.
(?<myPattern>\b\d(?!(?:\d{0,3}([-\/\\.])\d{1,2}\1\d{1,4})\b(?!\S))(?:[^\n\d\$\.\%]*\d){14}\b)
How can I keep the Named Capture Group but still return 4 matches ?
See example here.
Thanks.

Named capturing groups are still assigned numeric IDs. That is, (?<myPattern>[a-z]+)(\d+) contains two groups, the first one - with ID 1 - is "myPattern" group matching one or more lowercase letters, and the second group is \d+, with ID 2.
In your case, the problem arises due to the use of the \1 backreference later in the pattern. It refers to "myPattern" group value now, so the matching is incorrect.
To fix the issue, you need to replace \1 to the corresponding group, \2:
(?<myPattern>\b\d(?!(?:\d{0,3}([-\/\\.])\d{1,2}\2\d{1,4})\b(?!\S))(?:[^\n\d\$\.\%]*\d){14}\b)
See the regex demo.

Related

German Phone Number Regex

I have a regex
\(?\+\(?49?\)?[ ()]?([- ()]?\d[- ()]?){11}
This correctly matches German phone code like
+491739341284
+49 1739341284
(+49) 1739341284
+49 17 39 34 12 84
+49 (1739) 34 12 84
+(49) (1739) 34 12 84
+49 (1739) 34-12-84
but fails to match 0049 (1739) 34-12-84.
I need to adjust the regular expression so that it can match numbers with 0049 as well. can anyone help me with the regex?
Try this one:
\(?\+|0{0,2}\(?49\)?[ ()]*[ \d]+[ ()]*[ -]*\d{2}[ -]*\d{2}[ -]*\d{2}
https://regex101.com/r/CHjNBV/1
However, it's better to make it accept only +49 or 0049, and throw the error message in case the number fails validation. Because if someday you will require to extend the format - it will require making the regex much more complicated.
If you want to match the variations in the question, you might use a pattern like:
^(?:\+?(?:00)?(?:49|\(49\))|\(\+49\))(?: *\(\d{4}\)|(?: ?\d){4})? *\d\d(?:[ -]?\d\d){2}$
Explanation
^ Start of string
(?: Non capture group
\+? Match an optional +
(?:00)? Optionally match 2 zeroes
(?:49|\(49\)) Match 49 or (49)
| Or
\(\+49\) Match (+49)
) Close non capture gruop
(?: Non capture group
* Match optional spaces
\(\d{4}\) Match ( 4 digits and )
| Or
(?: ?\d){4} Repeat 4 times matching an optional space and a digit
)? Close non capture group and make it optional
* Match optional spaces
\d\d Match 2 digits
(?:[ -]?\d\d){2} Repeat 2 times matching either a space or - followed by 2 digits
$ End of string
Regex demo
Or a bit broader variation matching the 49 prefix variants, followed by matching 10 digits allowing optional repetitions of what is in the character class [ ()-]* in between the digits.
^(?:\+?(?:00)?(?:49|\(49\))|\(\+49\))(?:[ ()-]*\d){10}$
Regex demo

hours:minutes:seconds Timestamp regex

I'm trying to validate a user inputed video timestamp in the format hours:minutes:seconds using a regex. So I'm assuming the hours component can be arbitrarily long, so the final format is basically hhhh:mm:ss where there can be any number of h. This is what I have so far
(([0-9]+:)?([0-5][0-9]:))?([0-5][0-9])
where
(
([0-9]+:)? # hh: optionally with an arbitrary number of h
([0-5][0-9]:) # mm: , with mm from 00 to 59
)? # hh:mm optionally
([0-5][0-9]) # ss , wtih ss from 00 to 59
which I believe is almost there, but doesn't handle cases like 1:31 or just 1. So to account for this if I add the first digit inside the mm and ss blocks as optional,
(([0-9]+:)?([0-5]?[0-9]:))?([0-5]?[0-9])
firstly the last seconds block starts matching values like 111. Also values like 1:1:12 are matched , which I don't want (should be 1:01:12). So how can I modify this so that m:ss and s are valid whereas h:m:ss,m:s and sss are not?
I am new to regular expressions, so apologies in advance if I'm doing something stupid. Any help is appreciated. Thanks.
You can match either 1 or more digits followed by an optional :mm:ss part, or match mm:ss.
To also match 6:12 and not 1:1:12 make only the first digit optional in the second part of the pattern.
^(?:\d+(?::[0-5][0-9]:[0-5][0-9])?|[0-5]?[0-9]:[0-5][0-9])$
^ Start of string
(?: Non capture group
\d+ Match 1+ digits
(?::[0-5][0-9]:[0-5][0-9])? Match an optional :mm:ss part, both in range 00 - 59
| Or
[0-5]?[0-9]:[0-5][0-9] Match m:ss or mm:ss both in range 00-59 where the first m is optional
) Close non capture group
$ End of string
Regex demo
Doesn't adding the positional anchors(^ and $)solve your problem?
^(([0-9]+:)?([0-5][0-9]:))?([0-5][0-9])$
Check here: https://regex101.com/r/fRZf2R/1

Regex Match Hexidecimal in Groups of 2-8

I am working on a regular expression to match against a hexadecimal string and having some trouble near the end. I am specifically looking for groups of 2 bytes that do not contain 00 that are between 2 and 8 bytes long. I have it all working except that when there are less than 8 bytes, it will allow extra 00 to be in it sometimes.
https://regex101.com/r/jq3QpP/1/
(?!(00)+)([0-9a-fA-F]{2,8})?(?!(00)+) // This on the below text gives the following matches
C86B0200554E0200C86B02000000000000000000270000008109000000000000EC6A050079750
18881000000410000280100000000000000000001000002010400000000000000000000000000
0000000000000000000000F65FA45900000000FF0000002F0000000000000049000000403C9F5
A000000000000000000000000FFFF330000000000000F06EAE8333536
Match 1
Full match 0-8 `C86B0200`
Group 2. 0-8 `C86B0200`
Match 2
Full match 8-16 `554E0200`
Group 2. 8-16 `554E0200`
Match 3
Full match 16-21 `C86B0`
Group 2. 16-21 `C86B0`
Match 4
Full match 21-21 ``
Match 5
Full match 39-47 `02700000`
Group 2. 39-47 `02700000`
In match 1,2,5 there are extra 00, in match 3, it missed the 20 for some reason. If you have an idea what I missed, please let me know
You can avoid matching 00 by allowing only one 0 in two digits at a time instead:
(?:[A-F1-9][A-F0-9]|[A-F0-9][A-F1-9]){1,4}(?=(?:..)*$)
Demo: https://regex101.com/r/2hebvf/2

Extracting group only if previous group was matched

I have two examples
abc 34 def12 ghi
abc 34 33 ghi
and a regexp
^.*?([0-9]{2}) ?([a-z]{2,3})? ?([0-9]{2}).*$
(see https://regex101.com/r/U2JNaS/1)
I need to modify it in such way to extract $1, $2, $3 but only if $2 was present, i.e. I need it to return
34 def12
<WRONG>
How to achieve that?
Note that you put a ? after the second capturing group (([a-z]{2,3})).
It causes that the whole regex will match even if the particular row does not contain the "letter" part.
Just remove this ?, so that in this case the whole regex will not match.

Regex to match the value optionally enclosed by double quotes

I have 3 columns delimited by white spaces but the second field is optionally enclosed by double quotes.
I want to extract the 1st field, 2nd field(value within the double quotes) and third field, sometimes the 2nd field value might not be enclosed within the double quotes in that case just return the existing value.
Sample Input
1a "2a 2.1a 2.2a" 3a
4b "5.5b 5.6b 5.7b" 6b
7c 8c 9c
Final output
Matching Information are
1st row match
\1 1a
\2 2a 2.1a 2.2a
\3 3a
2nd row match
\1 4b
\2 5.5b 5.6b 5.7b
\3 6b
3rd row match
\1 7c
\2 8c
\3 9c
I tried the below regex and it works fine for the first two inputs but the third line is not matched, Can someone help me to solve this issue?
Regex i tried:
([a-z0-9]+)\s+"([a-z0-9\s.]+)"\s+([a-z0-9]+)
Link:
https://regex101.com/r/rN4uB4/1
You could simply make the quotations optional in your pattern. By following the preceding token with ? you're telling the regular expression engine to match the preceding between "zero and one" time.
([a-z0-9]+)\s+"?([a-z0-9\s.]+)"?\s+([a-z0-9]+)
If your language supports it, you could use the branch reset feature. By using this feature, both capturing groups in the alternatives are considered as one capturing group.
([a-z0-9]+)\s+(?|"([^"]+)"|([a-z0-9]+))\s+([a-z0-9]+)
The problem with your regex is that it is is optional that the values are quoted.
You can parse this using:
([a-z0-9]+)\s+"?([a-z0-9\s.]+)"?\s+([a-z0-9]+)
The ? means the group (or character " in this case) is optional.
It makes me however wonder what you want to do? This looks a lot like bash argument parsing. Sometimes you can take advantage of libraries for this...
EDIT
#PetSerAl brings up a valid point: both quotes " are independent of each other, so:
4b "5.5b 5.6b 5.7b 6b
4b 5.5b 5.6b 5.7b" 6b
Will match as well, you can solve this by introducing additional capture groups:
([a-z0-9]+)\s+("([a-z0-9\s.]+)"|([a-z0-9\s.]+))\s+([a-z0-9]+)
In that case the old capture groups map on the new as follows:
\1 -> \1
\2 -> \3 (with quotes) or \4 (without quotes)
\3 -> \5
One can also use \2 for the old \2, but the new \2 will include the quotes " as well if they are part of the string.
It will thus cost more postprocessing to handle them correctly.