How do I group optional strings together - regex

I need a regular expression which validates
"Optional str1 as string = ''",
but also
"str2 as string"
and also
"str3 as boolean, Optional dtm as date = Now"
So when "Optional" is used there must be a "=" sign
But "Optional" is optional
This is what i have tried:
(Optional\s|)(.*)(\s=\s|)(.*)
and this is not right. It validates too much.
Any hints?

You can use an anchor ^ to assert the start of the string.
Then either match the string if it contains Optional followed by an equals sign in the string, or use a negative lookahead (if supported) and match the whole line if it does not contain Optional using an alternation |
^(?:.*\bOptional\b.*\s=\s.*|(?!.*\bOptional\b).+)
Explanation
^ Start of string
(?: Non capture group
.*\bOptional\b.*\s=\s.* Match the whole line if it contains Optional and an equals sign between whitespace chars
| Or
(?!.*\bOptional\b).+ Match the whole line if it does not contain Optional
) Close non capture group
Regex demo

Related

Regex - All before an underscore, and all between second underscore and the last period?

How do I get everything before the first underscore, and everything between the last underscore and the period in the file extension?
So far, I have everything before the first underscore, not sure what to do after that.
.+?(?=_)
EXAMPLES:
111111_SMITH, JIM_END TLD 6-01-20 THR LEWISHS.pdf
222222_JONES, MIKE_G URS TO 7.25 2-28-19 SA COOPSHS.pdf
DESIRED RESULTS:
111111_END TLD 6-01-20 THR LEWISHS
222222_G URS TO 7.25 2-28-19 SA COOPSHS
You can match the following regular expression that contains no capture groups.
^[^_]*|(?!.*_).*(?=\.)
Demo
This expression can be broken down as follows.
^ # match the beginning of the string
[^_]* # match zero or more characters other than an underscore
| # or
(?! # begin negative lookahead
.*_ # match zero or more characters followed by an underscore
) # end negative lookahead
.* # match zero or more characters greedily
(?= # begin positive lookahead
\. # match a period
) # end positive lookahead
.*_ means to match zero or more characters greedily, followed by an underscore. To match greedily (the default) means to match as many characters as possible. Here that includes all underscores (if there are any) before the last one. Similarly, .* followed by (?=\.) means to match zero or more characters, possibly including periods, up to the last period.
Had I written .*?_ (incorrectly) it would match zero or more characters lazily, followed by an underscore. That means it would match as few characters as possible before matching an underscore; that is, it would match zero or more characters up to, but not including, the first underscore.
If instead of capturing the two parts of the string of interest you wanted to remove the two parts of the string you don't want (as suggested by the desired results of your example), you could substitute matches of the following regular expression with empty strings.
_.*_|\.[^.]*$
Demo
This regular expression reads, "Match an underscore followed by zero of more characters followed by an underscore, or match a period followed by zero or more characters that are not periods, followed by the end of the string".
You could use 2 capture groups:
^([^_\n]+_).*\b([^\s_]*_.*)(?=\.)
^ Start of string
([^_\n]+_) Capture group 1, match any char except _ or a newline followed by matching a _
.*\b Match the rest of the line and match a word boundary
([^\s_]*_.*) Capture group 2, optionally match any char except _ or a whitespace char, then match _ and the rest of the line
(?=\.) Positive lookahead, assert a . to the right
See a regex demo.
Another option could be using a non greedy version to get to the first _ and make sure that there are no following underscores and then match the last dot:
^([^_\n]+_).*?(\S*_[^_\n]+)\.[^.\n]+$
See another regex demo.
Looks like you're very close. You could eliminate the names between the underscores by finding this
(_.+?_)
and replacing the returned value with a single underscore.
I am assuming that you did not intend your second result to include the name MIKE.

Parenthesis content after a specific word

I'm trying to get UNIX group names using a regex (can't use groups because I can only get the process uid, so I'm using id <process_id> to get groups)
input looks like this
uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n
I'd like to capture kawsay, sudo, video and gpio
The only pieces I've got are:
a positive lookbehind to start capturing after groups: /(?<=groups)/
capture the parenthesis content: /\((\w+)\)/
Using PCRE's \G you may use this regex:
(?:\bgroups=|(?<!^)\G)[^(]*\(([^)]+)\)
Your intended matches are available in capture group #1
RegEx Demo
RegEx Details:
(?:: Start non-capture group
\bgroups=: Match word groups followed by a =
|: OR
(?<!^)\G: Start from end position of the previous match
): End non-capture group
[^(]*: Match 0 or more of any character that is not (
\(: Match opening (
([^)]+): Use capture group #1 to match 1+ of any non-) characters
\): Match closing )
You can use
(?:\G(?!\A)\),|\bgroups=)\d+\(\K\w+
See the regex demo. Details:
(?:\G(?!\A)\),|\bgroups=) - either of
\G(?!\A)\), - end of the previous match (\G operator matches either start of string or end of the previous match, so the (?!\A) is necessary to exclude the start of string location) and then ), substring
| - or
\bgroups= - a whole word groups (\b is a word boundary) and then a = char
\d+\( - one or more digits and a (
\K - match reset operator that makes the regex engine "forget" the text matched so far
\w+ - one or more word chars.
Here are two more ways to extract the strings of interest. Both return matches and do not employ capture groups. My preference is for second one.
str = "uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n"
Match substrings between parentheses that are not followed later in the string with "groups="
Match the regular expression
rgx = /(?<=\()(?!.*\bgroups=).*?(?=\))/
str.scan(rgx)
#=> ["kawsay", "sudo", "video", "gpio"]
Demo
See String#scan.
This expression can be broken down as follows.
(?<=\() # positive lookbehind asserts previous character is '('
(?! # begin negative lookahead
.* # match zero or more characters
\bgroups= # match 'groups=' preceded by a word boundary
) # end negative lookahead
.* # match zero or more characters lazily
(?=\)) # positive lookahead asserts next character is ')'
This may not be as efficient as expressions that employ \G (because of the need to determine if 'groups=' appears in the string after each left parenthesis), but that may not matter.
Extract the portion of the string following "groups=" and then match substrings between parentheses
First, obtain the portion of the string that follows "groups=":
rgx1 = /(?<=\bgroups=).*/
s = str[rgx1]
#=> "1001(kawsay),27(sudo),44(video),997(gpio)\n"
See String#[].
Then match the regular expression
rgx2 = /(?<=\()[^\)\r\n]+/
against s:
s.scan(rgx2)
#=> ["kawsay", "sudo", "video", "gpio"]
The regular expression rgx1 can be broken down as follows:
(?<=\bgroups=) # Positive lookbehind asserts that the current
# position in the string is preceded by`'groups'`,
# which is preceded by a word boundary
.* # match zero of more characters other than line
# terminators (to end of line)
rgx2 can be broken down as follows:
(?<=\() # Use a positive lookbehind to assert that the
# following character is preceded by '('
[^\)\r\n]+ # Match one or more characters other than
# ')', '\r' and '\n'
Note:
The operations can of course be chained: str[/(?<=\bgroups=).*/].scan(/(?<=\()[^\)\r\n]+/); and
rgx2 could alternatively be written /(?<=\().+?(?=\)), where ? makes the match of one or more characters lazy and (?=\)) is a positive lookahead that asserts that the match is followed by a right parenthesis.
This would probably be the fastest solution of those offered and certainly the easiest to test.

Regex to validate cookie string (Key value paired)

So far I tried this regex but no luck.
([^=;]+=[^=;]+(;(?!$)|$))+
Valid Strings:
something=value1;another=value2
something=value1 ; anothe=value2
Invalid Strings:
something=value1 ;;;name=test
some=value=3;key=val
somekey=somevalue;
You might use an optional repeating group to get the matches.
If you don't want to cross newline boundaries, you might add \n or \r\n to the negated character class.
^[^=;\n]+=[^=;\n]+(?:;[^=;\n]+=[^=;\n]+)*$
Explanation
^ Start of string
[^=;\n]+=[^=;\n]+ Match the key and value using a negated character class
(?: Non capture group
;[^=;\n]+=[^=;\n]+ Match a comma followed by the same pattern
)* Close group and repeat 0+ times
$ End string
Regex demo

Regex: exclude string from matched pattern

Input string:
hrStorageDescr{hrStorageDescr="devfs: dev file system, mounted on: /.mount/dev"}
Regex to match value of hrStorageDescr only:
.*hrStorageDescr="(.*?)",.*
How to write this regex in order to preserve matching function, but exclude everything in the value, if devfs string is matched?
You could match bhrStorageDescr preceded by a word boundary \b
First match =" and assert what is directly to the right is not devfs followed by a word boundary using a negative lookahead (?!devfs\b)
If that assertion succeeds, capture in the group matching any char except a " using a negated character class and close the group before matching the closing double quote ([^"]+)
Using .* will match the last occurrence of the pattern, using .*? will match the first. If you want to match all occurrences you could omit that part, assuming you allowed to match all matches instead of a single match.
.*?\bhrStorageDescr="(?!devfs\b)([^"]+)"
Regex demo

Regex using negative lookahead missing first character of group2

I need to get the LDAP group names from this example string:
"user.ldap.groups.name" = "M-Role13" AND ("user.ldap.groups.name"= "M Role1" OR "user.ldap.groups.name" = "M.Group-Role16" OR "user.ldap.groups.name"="Admin Role" ) AND "common.platform" = "iOS" AND ( AND "ios.PersonalHotspotEnabled" = true ) AND "common.retired" = False
I'm using this regex to match the parts of the string that contains an LDAP group
("user\.ldap\.groups\.name"?.=.?".+?(.*?)")(?!"user\.ldap\.groups\.name")
but it is matching in group2 the name without the first character.
https://regex101.com/r/2Aby6K/1
A few notes about the pattern you tried
The reason it misses the first character is because this part .+? requires at least a single character
Note that in this part "?.=.?" it matches an optional ", an equals sign between any char due to the dot where the second dot is optional and then "
This part (.*?)")(?!"user\.ldap\.groups\.name") uses a non greedy dot .*? which will give up as least as possible to satisfy the condition to match a " which is not directly followed by user.ldap.groups.name. See an example of an incorrect match.
What you might do is use a negated character class
"user\.ldap\.groups\.name"\s*=\s*"([^"]+)"
In parts
"user\.ldap\.groups\.name" Match
\s*=\s* Match = between 0+ whitespace chars on the left and right
"( Match " and start capturing group
[^"]+ Match any char except " 1+ times
)" Close group and match "
Regex demo
Or if you want to include the negative lookahead:
"user\.ldap\.groups\.name"\s*=\s*"([^"]+)"(?!"user\.ldap\.groups\.name")
Regex demo