Capture a substring of a matched group

Capture a substring of a matched group - regex

Scanario
I have to grab a substring from a composed string.
Match condition:
string starts with 'section1:'
captured string may be a blank separated or a dash separated list of alphanumerical values
if the captured string ends with a specific suffix ('-xx'), exclude the suffix from the captured string.
Examples
section1:ypsilon : section 1 matches, grab 'ypsilon'
section1:ypsilon zeta : section 1 matches, grab 'ypsilon zeta'
section1:ypsilon-zeta : section 1 matches, grab 'ypsilon-zeta'
section1:ypsilon-xx : section 1 matches, grab 'ypsilon', exclude '-xx'
section1:ypsilon zeta-xx : section 1 matches, grab 'ypsilon zeta', exclude '-xx'
section1:ypsilon-zeta-xx : section 1 matches, grab 'ypsilon-zeta', exclude '-xx'
section2:ypsilon : section 2 does not match
Solution so far
^section1:([a-zA-Z0-9\- ]+)(\-xx)?$
The idea is to get the group 1, whereas the group 2 is optional.
Demo.
Question
Unfortunately the suffix matches the group1 definition, as it is an alphabetic string with a dash. So the resulting captured strings does not exclude the suffix.
Any clue?

You were close, the main problem you're facing is the greediness of operators.
n+ will match as many n as possible, if we wish to reduce this we have to suffix it with ?
I end up with this regex Demo here
^section1:([a-zA-Z0-9\- ]+?)(|-xx)$
Main difference is the ? after the + to make it non-greedy (or reluctant) and I prefer to use alternation between empty and desire suffix instead of a group (|-xx) this match nothing OR -xx before the end of line.
I've no argument between both, matter of taste I think.

Use alteration of -xx with a non capturing group and use ? to make + not so ready that -xx is sucked up in the match:
(?<=^section1):([a-zA-Z0-9\- ]+?)(?:-xx|:)
Demo
If you don't have the second : to use as a bookmark, use $:
(?<=^section1):([a-zA-Z0-9\- ]+?)(?:-xx|\s*$)
Demo 2

Related

How to conditionally expect particular characters if a prior regex matched?

I want to expect some characters only if a prior regex matched. If not, no characters (empty string) is expected.
For instance, if after the first four characters appears a string out of the group (A10, B32, C56, D65) (kind of enumeration) then a "_" followed by a 3-digit number like 123 is expected. If no element of the mentioned group appears, no other string is expected.
My first attempt was this but the ELSE branch does not work:
^XXX_(?<DT>A12|B43|D14)(?(DT)(_\d{1,3})|)\.ZZZ$
XXX_A12_123.ZZZ --> match
XXX_A11.ZZZ --> match
XXX_A12_abc.ZZZ --> no match
XXX_A23_123.ZZZ --> no match
These are examples of filenames. If the filename contains a string of the mentioned group like A12 or C56, then I expect that this element if followed by an underscore followed by 1 to 3 digits. If the filename does not contain a string of that group (no character or a character sequence different from the strings in the group) then I don't want to see the underscore followed by 1 to 3 digits.
For instance, I could extend the regex to
^XXX_(?<DT>A12|B43|D14)_\d{5}(?(DT)(_\d{1,3})|)_someMoreChars\.ZZZ$
...and then I want these filenames to be valid:
XXX_A12_12345_123_wellDone.ZZZ
XXX_Q21_00000_wellDone.ZZZ
XXX_Q21_00000_456_wellDone.ZZZ
...but this is invalid:
XXX_A12_12345_wellDone.ZZZ
How can I make the ELSE branch of the conditional statement work?
In the end I intend to have two groups like
Group A: (A11, B32, D76, R33)
Group B: (A23, C56, H78, T99)
If an element of group A occurs in the filename then I expect to find _\d{1,3} in the filename.
If an element of group B occurs ion the filename then the _\d{1,3} shall be optional (it may or may not occur in the filename).
I ended up in this regex:
^XXX_(?:(?A12|B43|D14))?(?(DT)(_\d{5}_\d{1,3})|(?!(?&DT))(?!.*_\d{3}(?!\d))).*\.ZZZ$
^XXX_(?:(?<DT>A12|B43|D14))?_\d{5}(?(DT)(_\d{1,3})|(?!(?&DT))(?!.*_\d{3}(?!\d))).+\.ZZZ$
Since I have to use this regex in the OpenApi #Pattern annotation I have the problem that I get the error:
Conditionals are not supported in this regex dialect.
As #The fourth bird suggested alternation seems to do the trick:
XXX_((((A12|B43|D14)_\d{5}_\d{1,3}))|((?:(A10|B10|C20)((?:_\d{5}_\d{3})|(?:_\d{3}))))).*\.ZZZ$

The else branch is the part after the |, but if you also want to match the 2nd example, the if clause would not work as you have already matched one of A12|B43|D14
The named capture group is not optional, so the if clause will always be true.
What you can do instead is use an alternation to match either the numeration part followed by an underscore and 3 digits, or match an uppercase char and 2 digits.
^XXX_(?:(?<DT>A12|B43|D14)_\d{1,3}|[A-Z]\d{2})\.ZZZ$
Regex demo
If you want to make use of the if/else clause, you can make the named capture group optional, and then check if group 1 exists.
^XXX_(?<DT>A12|B43|D14)?(?(DT)_\d{1,3}|[A-Z]\d{2})\.ZZZ$
Regex demo
For the updated question:
^XXX_(?<DT>A12|B43|D14)?(?(DT)(?:_\d{5})?_\d{3}(?!\d)|(?!A12|B43|D14|[A-Z]\d{2}_\d{3}(?!\d))).*\.ZZZ$
The pattern matches:
^ Start of string
XXX_ Match literally
(?<DT>A12|B43|D14)?
(?(DT) If we have group DT
(?:_\d{5})? Optionally match _ and 5 digits
_\d{3}(?!\d) Match _ and 3 digits
| Or
(?! Negative lookahead, assert not to the right
A12|B43|D14| Match one of the alternatives, or
[A-Z]\d{2}_\d{3}(?!\d) Match 1 char A-Z, 2 digits _ 3 digits not followed by a digit
) Close lookahead
) Close if clause
.* Match the rest of the line
\.ZZZ Match . and ZZZ
$ End of string
Regex demo

Regex to extract static text and number using only regular expression

I am completely new to this regular expression.
But I tried to write the regular expression to get some static text and phone number for the below text
"password":"password123:cityaddress:mailaddress:9233321110:gender:45"
I written like below to extract this : "password":9233321110
(([\"]password[\"][\s]*:{1}[\s]*))(\d{10})?
regex link for demo:
https://regex101.com/r/2vNpMU/2
the correct regexp gives full match as "password":9233321110 in regex tool
I am not using any programming language here, this is for network packet capture at F5 level.
Please help me with the regexp;

I would use /^([^:]+)(?::[^:]+){3}:([^:]+)/ for this.
Explained (more detailed explanation at regex101):
^ matches from the start of the string
(…) is the first capture group. This will collect that initial "password"
[^:]+ matches one or more non-colon characters
(?:…) is a non-capturing group (it collects nothing for later)
:[^:]+ matches a colon and then 1+ non-colons
{3} instructs us to match the previous item (the non-capturing group) 3 times
: matches a literal colon
([^:]+) captures a match of 1+ non-colons, which will get us 9233321110 in this example
The first capture group is typically stored as $1 or the first item of the returned array. (In Javascript, the zeroth item is the full match and item index 1 is the first capture group.) The second capture group is $2, etc.
To always match the "password" key, hard-code it: /^("password")(?::[^:]+){3}:([^:]+)/
Here's a live snippet demonstrating it:
x = `"password":"password123:cityaddress:mailaddress:9233321110:gender:45"`;
match = x.match(/^([^:]+)(?::[^:]+){3}:([^:]+)/);
if (match) console.log(match[1] + ":" + match[2]);
else console.log("no match");

Regex- to extract a string before and after string

Want extract string before and after the word. Below are the content.
Content:
1. http://www.example.com/myplan/mp/public/pl_be?Id=543543&timestamp=06280435435
2. http://www.example.com/course/df/public/pl_de?Id=454354&timestamp=0628031746
3. http://www.example.com/book/rg/public/pl_fo?Id=4445577&timestamp=0628031734
4. http://www.example.com/trip/tr/public/pl_ds?Id=454354&timestamp=06280314546
5. http://www.example.com/trip/tr/public/pl_ds
I want capture data for above string as below
1. http://www.example.com/myplan/mp/public/?Id=543543
2. http://www.example.com/course/df/public/?Id=454354
3. http://www.example.com/book/rg/public/?Id=4445577
4. http://www.example.com/trip/tr/public/?Id=454354
5. http://www.example.com/trip/tr/public/
I have tried with (./(?![A-Za-z]{2}_[A-Za-z]{2}).(?=&)). But it won't help.
I hope somebody can help me with this.

This pattern will catch what you want in two groups. It's more safe than other other examples that have been suggested so far because it allows for some variance in the URL.
(.*)\w\w_\w\w.*?(?:(?:[&?]\w+=\d+|%\w*)*?(\?Id=\d+)(?:.*))?
(.*) captures everything up until your xx_xx part (capture group 1)
\w\w_\w\w.* matches xx_xx and everything up until the next capture section
(?:[&?]\w+=\d+|%\w*)*? allows for there to be other & % or ? properties in your URL before your ?Id= property
(\?Id=\d+) captures your Id property (capture group 2)
(?:.*) is unnecessary but it bugs me when not all of the text is highlighted on regex101 ¯\_(ツ)_/¯
the optional non-capturing group here (?:(?:[&?]\w+=\d+|%\w*)*?(\?Id=\d+)(?:.*))? allows it to match URLs that don't have ID properties.
Here's an example of how it works

Response updated:
This pattern will do the work for you:
(.*\/)[^?]*(?:(\?[^&]*).*)?
Explanation:
(.*\/) -> Will match and capture every character until the / character is present (The .* is a greedy operator).
[^?]* -> Will match everything that's not a ? character.
(?:(\?[^&]*).*)? -> First of all, (?: ... ) is a non-capturing group, the ? at the end of this makes this group optional, (\?[^&]*) will match and capture the ? character and every non & character next to it, the last .* will match everything after the first param in the URL.
Then you can replace the string using only the first and second capture groups.
Here is a working example in regex101
Edit 2:
As emsimpson92 mentioned in the comments, the Id couldn't always be the first param, so you can use this pattern to match the Id param:
(.*\/)[^?]*(?:(\?).*?(Id=[^&]*).*)?
The important part here is that .*?(Id=[^&]*).* matches the Id param no matter its position.
.*? -> It matches all the characters until Id= is present. The trick here is that .* is a greedy quantifier but when is used in conjunction with ? it becomes a lazy one.
Here is an Example of this scenario in regex101

Regex to check only if the group is present

I have String which may have values like below.
854METHYLDOPA
041ALDOMET /00000101/
133IODETO DE SODIO [I 131]
In this i need to get the text starting from index 4 till we find any one these patterns /00000101/ or [I 131]
Expected Output:
METHYLDOPA
ALDOMET
IODETO DE SODIO
I have tried the below RegEx for the same
(?:^.{3})(.*)(?:[[/][A-Z0-9\s]+[]/\s+])
But this RegEx works if the string contains [/ but it doesn't work for the case1 where these patterns doesn't exist.
I have tried adding ? at the end but it works fore case 1 but doesn't work for case 2 and 3.
Could anyone please help me on getting the regx work?

Your logic is difficult to phrase. My interpretation is that you always want to capture from the 4th character onwards. What else gets captured depends on the remainder of the input. Should either /00000101/ or [I 131] occur, then you want to capture up until that point. Otherwise, you want to capture the entire string. Putting this all together yields this regex:
^.{3}(?:(.*)(?=/00000101/|\[I 131\])|(.*))
Demo

You may try this:
^.{3}(.*?)($|(?:\s*\/00000101\/)|(?:\s*\[I\s+131\])).*$
and replace by this to get the exact output you want.
\1
Regex Demo
Explanation:
^ --> start of a the string
.{3} --> followed by 3 characters
(.*?) --> followed by anything where ? means lazy it will fetch until it finds the following and won't go beyond that. It also captures it as
group 1 --> \1
($|(?:\s*\/00000101\/)|(?:\s*\[I\s+131\])) ---------->
$ --> ends with $ which means there is there is not such pattern that
you have mentioned
| or
(?:\s*\/00000101\/) -->another pattern of yours improvised with \s* to cover zero or more blank space.
| or
(?:\s*\[I\s+131\]) --> another pattern of yours with improvised \s+
which means 1 or more spaces. ?: indicates that we will not capture
it.
.*$ --> .* is just to match anything that follows and $
declares the end of string.
so we end up only capturing group 1 and nothing else which ensures to
replace everything by group1 which is your target output.

You could get the values you are looking for in group 1:
^.{3}(.+?)(?=$| ?\[I 131\]| ?\/00000101\/)
Explanation
From the beginning of the string ^
Match the first 3 characters .{3}
Match in a capturing group (where your values will be) any character one or more times non greedy (.+?)
A positive lookahead (?=
To assert what follow is either the end of the string $
or |
an optional space ? followed by [I 131] \[I 131\]
or |
an optional space ? followed by /00000101/ \/00000101\/
If your regex engine supports \K, you could try it like this and the values you are looking for are not in a group but the full match:
^.{3}\K.+?(?=$| ?\[I 131\]| ?\/00000101\/)

How to optionally match a group?

I have two possible patterns:
1.2 hello
1.2.3 hello
I would like to match 1, 2 and 3 if the latter exists.
Optional items seem to be the way to go, but my pattern (\d)\.(\d)?(\.(\d)).hello matches only 1.2.3 hello (almost perfectly: I get four groups but the first, second and fourth contain what I want) - the first test sting is not matched at all.
What would be the right match pattern?

Your pattern contains (\d)\.(\d)?(\.(\d)) part that matches a digit, then a ., then an optional digit (it may be 1 or 0) and then a . + a digit. Thus, it can match 1..2 hello, but not 1.2 hello.
You may make the third group non-capturing and make it optional:
(\d)\.(\d)(?:\.(\d))?\s*hello
^^^ ^^
See the regex demo
If your regex engine does not allow non-capturing groups, use a capturing one, just you will have to grab the value from Group 4:
(\d)\.(\d)(\.(\d))?\s*hello
See this regex.
Note that I replaced . before hello with \s* to match zero or more whitespaces.
Note also that if you need to match these numbers at the start of a line, you might consider pre-pending the pattern with ^ (and depending on your regex engine/tool, the m modifier).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Capture a substring of a matched group - regex

Use alteration of -xx with a non capturing group and use ? to make + not so ready that -xx is sucked up in the match: (?<=^section1):([a-zA-Z0-9\- ]+?)(?:-xx|:) Demo If you don't have the second : to use as a bookmark, use $: (?<=^section1):([a-zA-Z0-9\- ]+?)(?:-xx|\s*$) Demo 2

Related

How to conditionally expect particular characters if a prior regex matched?

Regex to extract static text and number using only regular expression

Regex- to extract a string before and after string

Regex to check only if the group is present

How to optionally match a group?

Categories

Resources