Regex using negative lookahead missing first character of group2 - regex

I need to get the LDAP group names from this example string:
"user.ldap.groups.name" = "M-Role13" AND ("user.ldap.groups.name"= "M Role1" OR "user.ldap.groups.name" = "M.Group-Role16" OR "user.ldap.groups.name"="Admin Role" ) AND "common.platform" = "iOS" AND ( AND "ios.PersonalHotspotEnabled" = true ) AND "common.retired" = False
I'm using this regex to match the parts of the string that contains an LDAP group
("user\.ldap\.groups\.name"?.=.?".+?(.*?)")(?!"user\.ldap\.groups\.name")
but it is matching in group2 the name without the first character.
https://regex101.com/r/2Aby6K/1

A few notes about the pattern you tried
The reason it misses the first character is because this part .+? requires at least a single character
Note that in this part "?.=.?" it matches an optional ", an equals sign between any char due to the dot where the second dot is optional and then "
This part (.*?)")(?!"user\.ldap\.groups\.name") uses a non greedy dot .*? which will give up as least as possible to satisfy the condition to match a " which is not directly followed by user.ldap.groups.name. See an example of an incorrect match.
What you might do is use a negated character class
"user\.ldap\.groups\.name"\s*=\s*"([^"]+)"
In parts
"user\.ldap\.groups\.name" Match
\s*=\s* Match = between 0+ whitespace chars on the left and right
"( Match " and start capturing group
[^"]+ Match any char except " 1+ times
)" Close group and match "
Regex demo
Or if you want to include the negative lookahead:
"user\.ldap\.groups\.name"\s*=\s*"([^"]+)"(?!"user\.ldap\.groups\.name")
Regex demo

Related

Parenthesis content after a specific word

I'm trying to get UNIX group names using a regex (can't use groups because I can only get the process uid, so I'm using id <process_id> to get groups)
input looks like this
uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n
I'd like to capture kawsay, sudo, video and gpio
The only pieces I've got are:
a positive lookbehind to start capturing after groups: /(?<=groups)/
capture the parenthesis content: /\((\w+)\)/
Using PCRE's \G you may use this regex:
(?:\bgroups=|(?<!^)\G)[^(]*\(([^)]+)\)
Your intended matches are available in capture group #1
RegEx Demo
RegEx Details:
(?:: Start non-capture group
\bgroups=: Match word groups followed by a =
|: OR
(?<!^)\G: Start from end position of the previous match
): End non-capture group
[^(]*: Match 0 or more of any character that is not (
\(: Match opening (
([^)]+): Use capture group #1 to match 1+ of any non-) characters
\): Match closing )
You can use
(?:\G(?!\A)\),|\bgroups=)\d+\(\K\w+
See the regex demo. Details:
(?:\G(?!\A)\),|\bgroups=) - either of
\G(?!\A)\), - end of the previous match (\G operator matches either start of string or end of the previous match, so the (?!\A) is necessary to exclude the start of string location) and then ), substring
| - or
\bgroups= - a whole word groups (\b is a word boundary) and then a = char
\d+\( - one or more digits and a (
\K - match reset operator that makes the regex engine "forget" the text matched so far
\w+ - one or more word chars.
Here are two more ways to extract the strings of interest. Both return matches and do not employ capture groups. My preference is for second one.
str = "uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n"
Match substrings between parentheses that are not followed later in the string with "groups="
Match the regular expression
rgx = /(?<=\()(?!.*\bgroups=).*?(?=\))/
str.scan(rgx)
#=> ["kawsay", "sudo", "video", "gpio"]
Demo
See String#scan.
This expression can be broken down as follows.
(?<=\() # positive lookbehind asserts previous character is '('
(?! # begin negative lookahead
.* # match zero or more characters
\bgroups= # match 'groups=' preceded by a word boundary
) # end negative lookahead
.* # match zero or more characters lazily
(?=\)) # positive lookahead asserts next character is ')'
This may not be as efficient as expressions that employ \G (because of the need to determine if 'groups=' appears in the string after each left parenthesis), but that may not matter.
Extract the portion of the string following "groups=" and then match substrings between parentheses
First, obtain the portion of the string that follows "groups=":
rgx1 = /(?<=\bgroups=).*/
s = str[rgx1]
#=> "1001(kawsay),27(sudo),44(video),997(gpio)\n"
See String#[].
Then match the regular expression
rgx2 = /(?<=\()[^\)\r\n]+/
against s:
s.scan(rgx2)
#=> ["kawsay", "sudo", "video", "gpio"]
The regular expression rgx1 can be broken down as follows:
(?<=\bgroups=) # Positive lookbehind asserts that the current
# position in the string is preceded by`'groups'`,
# which is preceded by a word boundary
.* # match zero of more characters other than line
# terminators (to end of line)
rgx2 can be broken down as follows:
(?<=\() # Use a positive lookbehind to assert that the
# following character is preceded by '('
[^\)\r\n]+ # Match one or more characters other than
# ')', '\r' and '\n'
Note:
The operations can of course be chained: str[/(?<=\bgroups=).*/].scan(/(?<=\()[^\)\r\n]+/); and
rgx2 could alternatively be written /(?<=\().+?(?=\)), where ? makes the match of one or more characters lazy and (?=\)) is a positive lookahead that asserts that the match is followed by a right parenthesis.
This would probably be the fastest solution of those offered and certainly the easiest to test.

How do I group optional strings together

I need a regular expression which validates
"Optional str1 as string = ''",
but also
"str2 as string"
and also
"str3 as boolean, Optional dtm as date = Now"
So when "Optional" is used there must be a "=" sign
But "Optional" is optional
This is what i have tried:
(Optional\s|)(.*)(\s=\s|)(.*)
and this is not right. It validates too much.
Any hints?
You can use an anchor ^ to assert the start of the string.
Then either match the string if it contains Optional followed by an equals sign in the string, or use a negative lookahead (if supported) and match the whole line if it does not contain Optional using an alternation |
^(?:.*\bOptional\b.*\s=\s.*|(?!.*\bOptional\b).+)
Explanation
^ Start of string
(?: Non capture group
.*\bOptional\b.*\s=\s.* Match the whole line if it contains Optional and an equals sign between whitespace chars
| Or
(?!.*\bOptional\b).+ Match the whole line if it does not contain Optional
) Close non capture group
Regex demo

Is there a regex to grab all spaces that separate key value pairs

Is there a regex to extract all spaces that separate key+value pairs and ignoring those delimited by double quotes
sample:
key1=value1 key1=value1 spaces="some spaces in text" nested1="key2=value2 key2=value2 key2=value2" nested2="key2=value2, key2=value2, key2=value2" quoted="his name is \"no body\""
this is where i come for so far: (?<!,) (?=\w+=), but of course it doesn't work.
[^\s="]+\s*=\s*(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^\s=]+)\K[ \t]+
PCRE demo
No need to write back. just matches space delimiters.
can replace with new delimiter
([^\s="]+\s*=\s*(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^\s=]+))([ \t]+)
Python demo
Can write back \1 or \2 if needed.
can replace with new delimiter
note - the part of the above expressions matching the field info
could benifit by placing Atomic group around (?>) but not strictly
necessary as the field structure is fairly concise.
are other options to garantee integrity as well like matching every
character with the use of the \G anchor if availibul.
let me know if need this approach.
many ways to go here
Here is another option:
".*?(?<!\\)"(*SKIP)(*F)| +
See the online demo
Please do let me know if it actually does what is required as I'm unsure. Anyways, here is a breakdown:
" - A literal double quote.
.*? - Anything but newline zero or more times but lazy.
(?<!\\) - A negative lookbehind for \.
" - A literal double quote.
(*SKIP)(*F) - Consume all characters of matches, force a failure and continue matching.
| - Alternation.
+ - One or more space characters.
If it's Python you are using, you'll need a reference to the PypI regex module.
You could do that with the following PCRE-compatible regular expression.
\G[^" \n]*(?:(?<!\\)"(?:[^\n"]|(?<=\\)")*(?<!\\)"[^" \n]*)*\K +
Start your engine!
\G : assert position at the end of the previous match
or the start of the string for the first match
[^" \n]* : match 0+ chars other than those in char class
(?: : begin non-capture group
(?<!\\) : use negative lookbehind to assert next char is not
preceded by a backslash
" : match double-quote
(?: : begin non-capture group
[^"\n] : match a char other than those in char class
| : or
(?<=\\) : use positive lookbehind to assert next char is
preceded by a backslash
" : match double-quote
) :end non-capture group
* : match non-capture group 0+ times
(?<!\\) : use negative lookbehind to assert next char is not
" : match double-quote
[^" \n]* : match 0+ chars other than those in char class
) : end non-capture group
* : match non-capture group 0+ times
\K : forget everything matched so far and reset start of match
\ + : match 1+ spaces

Seeking help on Regular expression

How can I extract this string from the text using regex
text: {abcdefgh="test-name-test-name-w2-a"} 54554654654 .654654654
Expected output: test-name-test-name-w2
Note: I tried this "([^\s]*)" and the output is test-name-test-name-w2-a. But need the output as I mentioned just above.
You can try with this regex
.*\"(.*)-.*\".*
The link to regex101 is test
You could extend the negated character class to also exclude - and ". Then use a repeating pattern using the same character class preceded with a -
The value is in the first capturing group.
"([^\s-"]+(?:-[^\s-"]+)*)-[^\s-"]+"
" Match a " char
( Capture group 1
[^\s-"]+ Match 1+ times any char except - " or a whitespace char
(?: Non capturing group
[^\s-"]+Match 1+ times any char except - " or a whitespace char
)* Close non capturing group, repeat 0+ times
) Close capture group
-[^\s-"]+ Match 1+ times any char except - " or a whitespace char
" Match a " char
Regex101 demo
(On regex101 at the FLAVOR panel you can switch between PCRE and Golang)
Update
To match where the word test is present and not for example test1 you could use a negative lookahead (?![^"\s]*\btest\w) to assert no presence of test followed by a word character.
""(?![^"\s]*\btest\w)([^\s-"]+(?:-[^\s-"]+)*)-[^\s-"]+""
Regex demo

Regular expression to match - inside a pattern

I have to match all - inside the following pattern
"word-word": #expected result find one -
"word-word" #expected result no - find because the : is missing in the end pattern
"word-word-word": #expected result find two -
"word-word #expected result no - find because the end pattern is ":
To match all the hyphens between " and ":, you could try it like this using positive and negative lookarounds. The hyphen will be in captured group 1:
(?<="(?:\w+-)*)\w+(-)(?=.*?":)
If you want to replace the hypen, you could capture the word in group 1, and match the hyphen.
Then as the replacement use $1 followed by your replacement:
(?<="(?:\w+-)*)(\w+)-(?=.*?":)
Explanation
(?<= Positive lookbehind that asserts that what is on the left
"(?:\w+-)* Match ", then repeat zero or more times one or more word characters followed by a hyphen.
) Close lookbehind
(\w+)- Match in a capturing group one or more word characters, then a dash
(?= Positive lookahead that asserts what is on the right side
.*?": Match zero or more characters non greedy followed by ":
) Close lookahead
Check the "Context" tab to see the replacement in .NET Regex Tester.
Do not know in c#, but Javascript example might be translatable:
result = '"word-word":'.replace(/^[^-]+((-)[^-]+)((-)[^-]+)?:$/, '$2$4');
You would have to check whether result is different from original.
If no, nothing was found, nothing was replaced.
Explanation:
String start, then something not "-"
followed by "-" and more characters not "-"
optionally followed by "-" and more characters not "-"
ending with ":"
Then you want the content of the second and fourth opening bracket.