I have the following regex:
\{(\w+)(?:\{(\w+))+\}+\}
I need it to match any of the following
{a{b}}
{a{b{c}}}
{a{b{c{d...}}}}
But by using the regex for example on the last one it only matches two groups: a and c it doesn't match the b and 'c', or any other words that might be in between.
How do I get the group to match each single one like:
group #1: a
group #2: b
group #3: c
group #4: d
group #4: etc...
or like
group #1: a
group #2: [b, c, d, etc...]
Also how do I make it so that you have the same amount of { on the left is there are } on the right, otherwise don't match?
Thanks for the help,
David
In .NET, a regex can 1) check balanced groups and 2) stores a capture collection per each capturing group in a group stack.
With the following regex, you may extract all the texts inside each {...} only if the whole string starting with { and ending with } contains a balanced amount of those open/close curly braces:
^{(?:(?<c>[^{}]+)|(?<o>){|(?<-o>)})*(?(o)(?!))}$
See the regex demo.
Details:
^ - start of string
{ - an open brace
(?: - start of a group of alternatives:
(?<c>[^{}]+) - 1+ chars other than { and } captured into "c" group
| - or
(?<o>{) - { is matched and a value is pushed to the Group "o" stack
| - or
(?<-o>}) - a } is matched and a value is popped from Group "o" stack
)* - end of the alternation group, repeated 0+ times
(?(o)(?!)) - a conditional construct checking if Group "o" stack is empty
} - a close }
$ - end of string.
C# demo:
var pattern = "^{(?:(?<c>[^{}]+)|(?<o>{)|(?<-o>}))*(?(o)(?!))}$";
var result = Regex.Matches("{a{bb{ccc{dd}}}}", pattern)
.Cast<Match>().Select(p => p.Groups["c"].Captures)
.ToList();
Output for {a{bb{ccc{dd}}}} is [a, bb, ccc, dd] while for {{a{bb{ccc{dd}}}} (a { is added at the beginning), results are empty.
For regex flavours supporting recursion (PCRE, Ruby) you may employ the following generic pattern:
^({\w+(?1)?})$
It allows to check if the input matches the defined pattern but does not capture desired groups. See Matching Balanced Constructs section in http://www.regular-expressions.info/recurse.html for details.
In order to capture the groups we may convert the pattern checking regex into a positive lookahead which would be checked only once at the start of string ((?:^(?=({\w+(?1)?})$)|\G(?!\A))) and then just capture all "words" using global search:
(?:^(?=({\w+(?1)?})$)|\G(?!\A)){(\w+)
The a, b, c, etc. are now in the second capture groups.
Regex demo: https://regex101.com/r/2wsR10/2. PHP demo: https://ideone.com/UKTfcm.
Explanation:
(?: - start of alternation group
[first alternative]:
^ - start of string
(?= - start of positive lookahead
({\w+(?1)?}) - the generic pattern from above
$ - enf of string
) - end of positive lookahead
| - or
[second alternative]:
\G - end of previous match
(?!\A) - ensure the previous \G does not match the start of the input if the first alternative failed
) - end of alternation group
{ - opening brace literally
(\w+) - a "word" captured in the second group.
Ruby has different syntax for recursion and the regex would be:
(?:^(?=({\w+\g<1>?})$)|\G(?!\A)){(\w+)
Demo: http://rubular.com/r/jOJRhwJvR4
Related
I'm attempting to parse group names from /etc/security/login-access.conf. We have a mixed environment of LDAP & AD machines. AD groups are encapsulated with parenthesis ().
I have the following regex that works to extract only the group name, however the only problem I am having with it is there is routinely a 'null' group and the regex returns a null & the ) characters:
Current regex:
/(?<=\+\s:\s[#\(])(.*?)(?=[\)]?\s:)/
Sample /etc/security/login-access.conf:
+ : #ldapgroup1 : ALL
+ : #ldapgroup2 : ALL
+ : (#adgroup1) : ALL
+ : (#adgroup2) : ALL
+ : () : ALL # <---This is the problematic entry.
I'm not sure if or how to tune the regex to ignore an entry that contains nothing between the parenthesis. Any help is appreciated.
Since your regex engine appears to have capture groups, I would just express your pattern as:
\+ : (\(#\S+\)|#\S+) : \S+
Demo
Here I use an alternation to cleanly match either the parentheses or non parentheses variants of the LDAP group names.
Might not be the most efficient, definitely ugly but it works:
(?<=\+\s:\s#|\()([a-zA-Z0-9_-]+)(?=[\)]?\s:)
If you are using perl, you can use a branch reset group:
\+\h:\h(?|#([\w-]+)|\(#([\w-]+)\))\h:
The pattern matches:
\+\h:\h Match + and a colon between horizontal whitespace chars
(?| Branch reset group
#([\w-]+) Match # and capture 1+ word chars or a hyphen in group 1
| Or
\(#([\w-]+)\) Match (#, capture capture 1+ word chars or a hyphen in group 2 (which will be available in group 1 due to the branch reset group) and match )
)\h: Close branch reset group
Regex demo
I have a regex which takes the value from the given key as below
Regex .*key="([^"]*)".* InputValue key="abcd-qwer-qaa-xyz-vwxc"
output abcd-qwer-qaa-xyz-vwxc
But, on top of this i need to validate the value with starting only with abcd- and somewhere the following pattern matches -xyz
Thus, the input and outputs has to be as follows:
I tried below which is not working as expected
.*key="([^"]*)"?(/Babcd|-xyz).*
The key value pair is part of the large string as below:
object{one="ab-vwxc",two="value1",key="abcd-eest-wd-xyz-bnn",four="obsolete Values"}
I think by matching the key its taking the value and that's y i used this .*key="([^"]*)".*
Note:
Its a dashboard. you can refer this link and search for Regex: /"([^"]+)"/ This regex is applied on the query result which is a string i referred. Its working with that regex .*key="([^"]*)".* above. I'm trying to alter with that regexGroup itself. Hope this helps?
Can anyone guide or suggest me on this please? That would be helpful. Thanks!
Looks like you could do with:
\bkey="(abcd(?=.*-xyz\b)(?:-[a-z]+){4})"
See the demo online
\bkey=" - A word-boundary and literally match 'key="'
( - Open 1st capture group.
abcd - Literally match 'abcd'.
(?=.*-xyz\b) - Positive lookahead for zero or more characters (but newline) followed by literally '-xyz' and a word-boundary.
(?: - Open non-capturing group.
-[a-z]+ - Match an hyphen followed by at least a single lowercase letter.
){4} - Close non-capture group and match it 4 times.
) - Close 1st capture group.
" - Match a literal double quote.
I'm not a 100% sure you'd only want to allow for lowercase letter so you can adjust that part if need be. The whole pattern validates the inputvalue whereas you could use capture group one to grab you key.
Update after edited question with new information:
Prometheus uses the RE2 engine in all regular expressions. Therefor the above suggestion won't work due to the lookarounds. A less restrictive but possible answer for OP could be:
\bkey="(abcd(?:-\w+)*-xyz(?:-\w+)*)"
See the online demo
Will this work?
Pattern
\bkey="(abcd-[^"]*\bxyz\b[^"]*)"
Demo
You could use the following regular expression to verify the string has the desired format and to match the portion of the string that is of interest.
(?<=\bkey=")(?=.*-xyz(?=-|$))abcd(?:-[a-z]+)+(?=")
Start your engine!
Note there are no capture groups.
The regex engine performs the following operations.
(?<=\bkey=") : positive lookbehind asserts the current
position in the string is preceded by 'key='
(?= : begin positive lookahead
.*-xyz : match 0+ characters, then '-xyz'
(?=-|$) : positive lookahead asserts the current position is
: followed by '-' or is at the end of the string
) : end non-capture group
abcd : match 'abcd'
(?: : begin non-capture group
-[a-z]+ : match '-' followed by 1+ characters in the class
)+ : end non-capture group and execute it 1+ times
(?=") : positive lookahead asserts the current position is
: followed by '"'
My regex is something like this **(A)(([+-]\d{1,2}[YMD])*)** which is matching as expected like A+3M, A-3Y+5M+3D etc..
But I want to capture all the groups of this sub pattern**([+-]\d{1,2}[YMD])***
For the following example A-3M+2D, I can see only 4 groups. A-3M+2D (group 0), A(group 1), -3M+2D (group 2), +2D (group 3)
Is there a way I can get the **-3M** as a separate group?
Repeated capturing groups usually capture only the last iteration. This is true for Kotlin, as well as Java, as the languages do not have any method that would keep track of each capturing group stack.
What you may do as a workaround, is to first validate the whole string against a certain pattern the string should match, and then either extract or split the string into parts.
For the current scenario, you may use
val text = "A-3M+2D"
if (text.matches("""A(?:[+-]\d{1,2}[YMD])*""".toRegex())) {
val results = text.split("(?=[-+])".toRegex())
println(results)
}
// => [A, -3M, +2D]
See the Kotlin demo
Here,
text.matches("""A(?:[+-]\d{1,2}[YMD])*""".toRegex()) makes sure the whole string matches A and then 0 or more occurrences of + or -, 1 or 2 digits followed with Y, M or D
.split("(?=[-+])".toRegex()) splits the text with an empty string right before a - or +.
Pattern details
^ - implicit in .matches() - start of string
A - an A substring
(?: - start of a non-capturing group:
[+-] - a character class matching + or -
\d{1,2} - one to two digits
[YMD] - a character class that matches Y or M or D
)* - end of the non-capturing group, repeat 0 or more times (due to * quantifier)
\z - implicit in matches() - end of string.
When splitting, we just need to find locations before - or +, hence we use a positive lookahead, (?=[-+]), that matches a position that is immediately followed with + or -. It is a non-consuming pattern, the + or - matched are not added to the match value.
Another approach with a single regex
You may also use a \G based regex to check the string format first at the start of the string, and only start matching consecutive substrings if that check is a success:
val regex = """(?:\G(?!^)[+-]|^(?=A(?:[+-]\d{1,2}[YMD])*$))[^-+]+""".toRegex()
println(regex.findAll("A-3M+2D").map{it.value}.toList())
// => [A, -3M, +2D]
See another Kotlin demo and the regex demo.
Details
(?:\G(?!^)[+-]|^(?=A(?:[+-]\d{1,2}[YMD])*$)) - either the end of the previous successful match and then + or - (see \G(?!^)[+-]) or (|) start of string that is followed with A and then 0 or more occurrences of +/-, 1 or 2 digits and then Y, M or D till the end of the string (see ^(?=A(?:[+-]\d{1,2}[YMD])*$))
[^-+]+ - 1 or more chars other than - and +. We need not be too careful here since the lookahead did the heavy lifting at the start of string.
I'm trying to capture a text into 3 groups I have managed to capture 2 groups but having an issue with the 3rd group.
This is the text :
<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3]
Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending
itemsCount=1
I'm using the following regex:
(?=- )(.*?)(?= - )|(?=])(.*?)(?= -)
My 3rd group should be : "After sending itemsCount=1"
any suggestions?
Your original expression is fine, just missing a $:
(?=- )(.*?)(?= - |$)|(?=])(.*?)(?= -)
Demo
and maybe we would slightly modify that to an expression similar to:
(?=-\s+).*?([A-Z].*?)(?=\s+-\s+|$)|(?=]\s+).*?([A-Z].*?)(?=\s+-)
Demo
You have 2 capturing groups. You don't get the match for the third part because the postitive lookahead in the first alternation is not considering the end of the string. You might solve that by using an alternation to look at either a space or assert the end of the string
(?=[-\]] )(.*?)(?= - |$)
^^
If those matches are ok, you could simplify that pattern by making use of a character class to match either - or ] like [-\]] and omit the alternation and the group as you now have only the matches.
Your pattern then might look like (also capturing the leading hyphen like the first 2 matches)
(?=[-\]] ).*?(?= - |$)
Regex demo
If this is your string and you want to have 3 capturing groups, you might use:
^.*?\[\d+\]([^-]+)-([^-]+)-\s*([^-]+)$
^ Start of string
.*? Match any char except a newline non greedy
\[\d+\] match [ 1+ digits ]
([^-]+)- Capture group 1, match 1+ times not -, then match -
([^-]+)- Capture group 2, match 1+ times not -, then match -
\s* Match 0+ whitespace chars
([^-]+) Capture group 2, match 1+ times not -
$ End of string
Regex demo
For example creating the desired object from the comments, you could first get all the matches from match[0] and store those in an array.
After you have have all the values, assemble the object using the keys and the values.
var output = {};
var regex = new RegExp(/(?=[-\]] ).*?(?= - |$)/g);
var str = `<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3] Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending itemsCount=1`;
var match;
var values = [];
var keys = ['Thread', 'Class', 'Message'];
while ((match = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (match.index === regex.lastIndex) {
regex.lastIndex++;
}
values.push(match[0]);
}
keys.forEach((key, index) => output[key] = values[index]);
console.log(output);
I'm trying to make a regex expression which capture multiple groups of data.
Here is some data example :
sampledata=X
B : xyz=1 FAB1_1=03 FAB2_1=01
A : xyz=1 FAB1_1=03 FAB2_1=01
I need to capture the X which should appear one time, and FAB1_1=03, FAB2_1=01, ... All the strings which starts with FAB.
So, I could capture all "FAB" like this :
/(FAB[0-9]_[0-9]=[0-9]*)/sg
But I could not include the capture of X using this expression :
/sampledata=(?<samplegroup>[0-9A-Z]).*(FAB[0-9]_[0-9]=[0-9]*)/sg
This regex only return one group with X and the last match of group of "FAB".
You can use
(?:sampledata=(\S+)|(?!^)\G)(?:(?!FAB[0-9]_[0-9]=).)*(FAB[0-9]_[0-9])=([0-9]*)
See the regex demo
The regex is based on the \G operator that matches either the start of string or the end of the previous successful match. We restrict it to match only in the latter case with a negative lookahead (?!^).
So:
(?:sampledata=(\S+)|(?!^)\G) - match a literal sampledata= and then match and capture into Group 1 one or more non-whitespace symbols -OR- match the end of the previous successful match
(?:(?!FAB[0-9]_[0-9]=).)* - match any text that is not FABn_n= (this is a tempered greedy token)
(FAB[0-9]_[0-9]) - Capture group 2, matching and capturing FAB followed with a digit, then a _, and one more digit
= - literal =
([0-9]*) - Capture group 3, matching and capturing zero or more digits
If you have 1 sampledata= block, you can safely unroll the tempered greedy token (demo) as
(?:sampledata=(\S+)|(?!^)\G)[^F]*(?:F(?!FAB[0-9]_[0-9]=)[^F]*)*?(FAB[0-9]_[0-9])=([0-9]*)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
That way, the expression will be more efficient.
If you have several sampledata blocks, enhance the tempered greedy token:
(?:sampledata=(\S+)|(?!^)\G)(?:(?!sampledata=|FAB[0-9]_[0-9]=).)*(FAB[0-9]_[0-9])=([0-9]*)
See another demo