Regex capture group multiple times and other groups - regex

I'm trying to make a regex expression which capture multiple groups of data.
Here is some data example :
sampledata=X
B : xyz=1 FAB1_1=03 FAB2_1=01
A : xyz=1 FAB1_1=03 FAB2_1=01
I need to capture the X which should appear one time, and FAB1_1=03, FAB2_1=01, ... All the strings which starts with FAB.
So, I could capture all "FAB" like this :
/(FAB[0-9]_[0-9]=[0-9]*)/sg
But I could not include the capture of X using this expression :
/sampledata=(?<samplegroup>[0-9A-Z]).*(FAB[0-9]_[0-9]=[0-9]*)/sg
This regex only return one group with X and the last match of group of "FAB".

You can use
(?:sampledata=(\S+)|(?!^)\G)(?:(?!FAB[0-9]_[0-9]=).)*(FAB[0-9]_[0-9])=([0-9]*)‌​
See the regex demo
The regex is based on the \G operator that matches either the start of string or the end of the previous successful match. We restrict it to match only in the latter case with a negative lookahead (?!^).
So:
(?:sampledata=(\S+)|(?!^)\G) - match a literal sampledata= and then match and capture into Group 1 one or more non-whitespace symbols -OR- match the end of the previous successful match
(?:(?!FAB[0-9]_[0-9]=).)* - match any text that is not FABn_n= (this is a tempered greedy token)
(FAB[0-9]_[0-9]) - Capture group 2, matching and capturing FAB followed with a digit, then a _, and one more digit
= - literal =
([0-9]*)‌​ - Capture group 3, matching and capturing zero or more digits
If you have 1 sampledata= block, you can safely unroll the tempered greedy token (demo) as
(?:sampledata=(\S+)|(?!^)\G)[^F]*(?:F(?!FAB[0-9]_[0-9]=)[^F]*)*?(FAB[0-9]_[0-9])=([0-9]*)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
That way, the expression will be more efficient.
If you have several sampledata blocks, enhance the tempered greedy token:
(?:sampledata=(\S+)|(?!^)\G)(?:(?!sampledata=|FAB[0-9]_[0-9]=).)*(FAB[0-9]_[0-9])=([0-9]*)
See another demo

Related

Problem with regular expression with 2 capture group, one is optional

I'm struggling to write the correct regex to match the data below. I want to capture the "Focus+Terminal" and its optional parameter "NYET". How can I re-write my incorrect regex?
user:\/\/(.*)(?:=(.*+))?
I also tried and failed:
user:\/\/(.*)=?(?:(.*+))?
Sample Data
* user://Focus+Terminal=NYET
* user://Focus+Terminal
You can use
user:\/\/(.*?)(?:=(.*))?$
See the regex demo.
Details:
user:\/\/ - a user:// string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
(?:=(.*))? - an optional non-capturing group that matches a = and then captures into Group 2 any zero or more chars other than line break chars as many as possible
$ - end of string.
As an alternative you might use a negated character class excluding matching a newline or equals sign for the first capture group.
user:\/\/([^=\n]*)(?:=(.*))?
Explanation
user:\/\/ Match user://
([^=\n]*) Capture group 1, match optional chars other than = or a newline
(?:=(.*))? Optionally match = and capture the rest of the line in group 2
Regex demo

Regex that checks for the validity of options, disallowing duplicates

I'm trying to find a regex to check for the validity of options that are supplied with a command.
Say that -a, -b and -c are valid options. They may be combined, for example as -ac or -abc. Order doesn't matter, so -ba is also valid.
I thought this regex would do the trick:
^-[abc]{1,3}$
But it has a downside. This regex also accepts duplicates, i.e. -abb.
How do I modify this regex to disallow duplicates?
You may use this regex with a capture group and a negative lookahead:
^-((?!.*\1)[abc]){1,3}$
RegEx Demo
RegEx Details:
^: Start
-: Match a -
(: Start capture group #1
(?!.*\1): Negative lookahead to make sure we don't have repeat of what we have in capture group #1 anywhere in the input
[abc]: Match a or b or c
){1,3}: End capture group #1. Repeat this group 1 to 3 times
$: End
You could list all the alternatives, but if it is a long character class, you can check that on the right side there is no char that is already captured using a capture group and a backreference.
^-(?![abc]*?([abc])[abc]*?\1)[abc]{1,3}$
^ Start of string
- Match a hyphen
(?! Negative lookahead, assert that at the right is not
[abc]*([abc])[abc]*\1 Match optional chars a, b or c and then capture 1 char. Then check that the captured char does not occur at the right side
) Close lookahead
[abc]{1,3} Match 1-3 times a b or c
$ End of string
Regex demo
Or a short version using only non whitespace chars, as the character class can only match 3 chars.
^-(?!\S*(\S)\S*\1)[abc]{1,3}$
Regex demo

Do not match if nothing exists between optional parenthesis

I'm attempting to parse group names from /etc/security/login-access.conf. We have a mixed environment of LDAP & AD machines. AD groups are encapsulated with parenthesis ().
I have the following regex that works to extract only the group name, however the only problem I am having with it is there is routinely a 'null' group and the regex returns a null & the ) characters:
Current regex:
/(?<=\+\s:\s[#\(])(.*?)(?=[\)]?\s:)/
Sample /etc/security/login-access.conf:
+ : #ldapgroup1 : ALL
+ : #ldapgroup2 : ALL
+ : (#adgroup1) : ALL
+ : (#adgroup2) : ALL
+ : () : ALL # <---This is the problematic entry.
I'm not sure if or how to tune the regex to ignore an entry that contains nothing between the parenthesis. Any help is appreciated.
Since your regex engine appears to have capture groups, I would just express your pattern as:
\+ : (\(#\S+\)|#\S+) : \S+
Demo
Here I use an alternation to cleanly match either the parentheses or non parentheses variants of the LDAP group names.
Might not be the most efficient, definitely ugly but it works:
(?<=\+\s:\s#|\()([a-zA-Z0-9_-]+)(?=[\)]?\s:)
If you are using perl, you can use a branch reset group:
\+\h:\h(?|#([\w-]+)|\(#([\w-]+)\))\h:
The pattern matches:
\+\h:\h Match + and a colon between horizontal whitespace chars
(?| Branch reset group
#([\w-]+) Match # and capture 1+ word chars or a hyphen in group 1
| Or
\(#([\w-]+)\) Match (#, capture capture 1+ word chars or a hyphen in group 2 (which will be available in group 1 due to the branch reset group) and match )
)\h: Close branch reset group
Regex demo

Regex to get value from <key, value> by asserting conditions on the value

I have a regex which takes the value from the given key as below
Regex .*key="([^"]*)".* InputValue key="abcd-qwer-qaa-xyz-vwxc"
output abcd-qwer-qaa-xyz-vwxc
But, on top of this i need to validate the value with starting only with abcd- and somewhere the following pattern matches -xyz
Thus, the input and outputs has to be as follows:
I tried below which is not working as expected
.*key="([^"]*)"?(/Babcd|-xyz).*
The key value pair is part of the large string as below:
object{one="ab-vwxc",two="value1",key="abcd-eest-wd-xyz-bnn",four="obsolete Values"}
I think by matching the key its taking the value and that's y i used this .*key="([^"]*)".*
Note:
Its a dashboard. you can refer this link and search for Regex: /"([^"]+)"/ This regex is applied on the query result which is a string i referred. Its working with that regex .*key="([^"]*)".* above. I'm trying to alter with that regexGroup itself. Hope this helps?
Can anyone guide or suggest me on this please? That would be helpful. Thanks!
Looks like you could do with:
\bkey="(abcd(?=.*-xyz\b)(?:-[a-z]+){4})"
See the demo online
\bkey=" - A word-boundary and literally match 'key="'
( - Open 1st capture group.
abcd - Literally match 'abcd'.
(?=.*-xyz\b) - Positive lookahead for zero or more characters (but newline) followed by literally '-xyz' and a word-boundary.
(?: - Open non-capturing group.
-[a-z]+ - Match an hyphen followed by at least a single lowercase letter.
){4} - Close non-capture group and match it 4 times.
) - Close 1st capture group.
" - Match a literal double quote.
I'm not a 100% sure you'd only want to allow for lowercase letter so you can adjust that part if need be. The whole pattern validates the inputvalue whereas you could use capture group one to grab you key.
Update after edited question with new information:
Prometheus uses the RE2 engine in all regular expressions. Therefor the above suggestion won't work due to the lookarounds. A less restrictive but possible answer for OP could be:
\bkey="(abcd(?:-\w+)*-xyz(?:-\w+)*)"
See the online demo
Will this work?
Pattern
\bkey="(abcd-[^"]*\bxyz\b[^"]*)"
Demo
You could use the following regular expression to verify the string has the desired format and to match the portion of the string that is of interest.
(?<=\bkey=")(?=.*-xyz(?=-|$))abcd(?:-[a-z]+)+(?=")
Start your engine!
Note there are no capture groups.
The regex engine performs the following operations.
(?<=\bkey=") : positive lookbehind asserts the current
position in the string is preceded by 'key='
(?= : begin positive lookahead
.*-xyz : match 0+ characters, then '-xyz'
(?=-|$) : positive lookahead asserts the current position is
: followed by '-' or is at the end of the string
) : end non-capture group
abcd : match 'abcd'
(?: : begin non-capture group
-[a-z]+ : match '-' followed by 1+ characters in the class
)+ : end non-capture group and execute it 1+ times
(?=") : positive lookahead asserts the current position is
: followed by '"'

regex how to match a capture group more than once

I have the following regex:
\{(\w+)(?:\{(\w+))+\}+\}
I need it to match any of the following
{a{b}}
{a{b{c}}}
{a{b{c{d...}}}}
But by using the regex for example on the last one it only matches two groups: a and c it doesn't match the b and 'c', or any other words that might be in between.
How do I get the group to match each single one like:
group #1: a
group #2: b
group #3: c
group #4: d
group #4: etc...
or like
group #1: a
group #2: [b, c, d, etc...]
Also how do I make it so that you have the same amount of { on the left is there are } on the right, otherwise don't match?
Thanks for the help,
David
In .NET, a regex can 1) check balanced groups and 2) stores a capture collection per each capturing group in a group stack.
With the following regex, you may extract all the texts inside each {...} only if the whole string starting with { and ending with } contains a balanced amount of those open/close curly braces:
^{(?:(?<c>[^{}]+)|(?<o>){|(?<-o>)})*(?(o)(?!))}$
See the regex demo.
Details:
^ - start of string
{ - an open brace
(?: - start of a group of alternatives:
(?<c>[^{}]+) - 1+ chars other than { and } captured into "c" group
| - or
(?<o>{) - { is matched and a value is pushed to the Group "o" stack
| - or
(?<-o>}) - a } is matched and a value is popped from Group "o" stack
)* - end of the alternation group, repeated 0+ times
(?(o)(?!)) - a conditional construct checking if Group "o" stack is empty
} - a close }
$ - end of string.
C# demo:
var pattern = "^{(?:(?<c>[^{}]+)|(?<o>{)|(?<-o>}))*(?(o)(?!))}$";
var result = Regex.Matches("{a{bb{ccc{dd}}}}", pattern)
.Cast<Match>().Select(p => p.Groups["c"].Captures)
.ToList();
Output for {a{bb{ccc{dd}}}} is [a, bb, ccc, dd] while for {{a{bb{ccc{dd}}}} (a { is added at the beginning), results are empty.
For regex flavours supporting recursion (PCRE, Ruby) you may employ the following generic pattern:
^({\w+(?1)?})$
It allows to check if the input matches the defined pattern but does not capture desired groups. See Matching Balanced Constructs section in http://www.regular-expressions.info/recurse.html for details.
In order to capture the groups we may convert the pattern checking regex into a positive lookahead which would be checked only once at the start of string ((?:^(?=({\w+(?1)?})$)|\G(?!\A))) and then just capture all "words" using global search:
(?:^(?=({\w+(?1)?})$)|\G(?!\A)){(\w+)
The a, b, c, etc. are now in the second capture groups.
Regex demo: https://regex101.com/r/2wsR10/2. PHP demo: https://ideone.com/UKTfcm.
Explanation:
(?: - start of alternation group
[first alternative]:
^ - start of string
(?= - start of positive lookahead
({\w+(?1)?}) - the generic pattern from above
$ - enf of string
) - end of positive lookahead
| - or
[second alternative]:
\G - end of previous match
(?!\A) - ensure the previous \G does not match the start of the input if the first alternative failed
) - end of alternation group
{ - opening brace literally
(\w+) - a "word" captured in the second group.
Ruby has different syntax for recursion and the regex would be:
(?:^(?=({\w+\g<1>?})$)|\G(?!\A)){(\w+)
Demo: http://rubular.com/r/jOJRhwJvR4