Regular expression to match - inside a pattern - regex

I have to match all - inside the following pattern
"word-word": #expected result find one -
"word-word" #expected result no - find because the : is missing in the end pattern
"word-word-word": #expected result find two -
"word-word #expected result no - find because the end pattern is ":

To match all the hyphens between " and ":, you could try it like this using positive and negative lookarounds. The hyphen will be in captured group 1:
(?<="(?:\w+-)*)\w+(-)(?=.*?":)
If you want to replace the hypen, you could capture the word in group 1, and match the hyphen.
Then as the replacement use $1 followed by your replacement:
(?<="(?:\w+-)*)(\w+)-(?=.*?":)
Explanation
(?<= Positive lookbehind that asserts that what is on the left
"(?:\w+-)* Match ", then repeat zero or more times one or more word characters followed by a hyphen.
) Close lookbehind
(\w+)- Match in a capturing group one or more word characters, then a dash
(?= Positive lookahead that asserts what is on the right side
.*?": Match zero or more characters non greedy followed by ":
) Close lookahead
Check the "Context" tab to see the replacement in .NET Regex Tester.

Do not know in c#, but Javascript example might be translatable:
result = '"word-word":'.replace(/^[^-]+((-)[^-]+)((-)[^-]+)?:$/, '$2$4');
You would have to check whether result is different from original.
If no, nothing was found, nothing was replaced.
Explanation:
String start, then something not "-"
followed by "-" and more characters not "-"
optionally followed by "-" and more characters not "-"
ending with ":"
Then you want the content of the second and fourth opening bracket.

Related

Find match within a first match

I have the following string
abc123+InterestingValue+def456
I want to get the InterestingValue only, I am using this regex
\+.*\+
but the output it still includes the + characters
Is there a way to search for a string between the + characters, then search again for anything that is not a + character?
Use lookarounds.
(?<=\+)[^+]*(?=\+)
DEMO
You can use a positive lookahead and a positive lookbehind (more info about these here). Basically, a positive lookbehind tells the engine "this match has to come before the next match", and a positive lookahead tells the engine "this has to come after the previous match". Neither of them actually match the pattern they're looking for though.
A positive lookbehind is a group beginning with ?<= and a positive lookahead is a group beginning with ?=. Adding these to your existing expression would look like this:
(?<=\+).*(?=\+)
regex101
If it should be the first match, you can use a capture group with an anchor:
^[^+]*\+([^+]+)\+
^ Start of string
[^+]* Optionally match any char except + using a negated character class
\+ Match literally
([^+]+) Capture group 1, match 1+ chars other than +
\+ Match literally
Regex demo

Parenthesis content after a specific word

I'm trying to get UNIX group names using a regex (can't use groups because I can only get the process uid, so I'm using id <process_id> to get groups)
input looks like this
uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n
I'd like to capture kawsay, sudo, video and gpio
The only pieces I've got are:
a positive lookbehind to start capturing after groups: /(?<=groups)/
capture the parenthesis content: /\((\w+)\)/
Using PCRE's \G you may use this regex:
(?:\bgroups=|(?<!^)\G)[^(]*\(([^)]+)\)
Your intended matches are available in capture group #1
RegEx Demo
RegEx Details:
(?:: Start non-capture group
\bgroups=: Match word groups followed by a =
|: OR
(?<!^)\G: Start from end position of the previous match
): End non-capture group
[^(]*: Match 0 or more of any character that is not (
\(: Match opening (
([^)]+): Use capture group #1 to match 1+ of any non-) characters
\): Match closing )
You can use
(?:\G(?!\A)\),|\bgroups=)\d+\(\K\w+
See the regex demo. Details:
(?:\G(?!\A)\),|\bgroups=) - either of
\G(?!\A)\), - end of the previous match (\G operator matches either start of string or end of the previous match, so the (?!\A) is necessary to exclude the start of string location) and then ), substring
| - or
\bgroups= - a whole word groups (\b is a word boundary) and then a = char
\d+\( - one or more digits and a (
\K - match reset operator that makes the regex engine "forget" the text matched so far
\w+ - one or more word chars.
Here are two more ways to extract the strings of interest. Both return matches and do not employ capture groups. My preference is for second one.
str = "uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n"
Match substrings between parentheses that are not followed later in the string with "groups="
Match the regular expression
rgx = /(?<=\()(?!.*\bgroups=).*?(?=\))/
str.scan(rgx)
#=> ["kawsay", "sudo", "video", "gpio"]
Demo
See String#scan.
This expression can be broken down as follows.
(?<=\() # positive lookbehind asserts previous character is '('
(?! # begin negative lookahead
.* # match zero or more characters
\bgroups= # match 'groups=' preceded by a word boundary
) # end negative lookahead
.* # match zero or more characters lazily
(?=\)) # positive lookahead asserts next character is ')'
This may not be as efficient as expressions that employ \G (because of the need to determine if 'groups=' appears in the string after each left parenthesis), but that may not matter.
Extract the portion of the string following "groups=" and then match substrings between parentheses
First, obtain the portion of the string that follows "groups=":
rgx1 = /(?<=\bgroups=).*/
s = str[rgx1]
#=> "1001(kawsay),27(sudo),44(video),997(gpio)\n"
See String#[].
Then match the regular expression
rgx2 = /(?<=\()[^\)\r\n]+/
against s:
s.scan(rgx2)
#=> ["kawsay", "sudo", "video", "gpio"]
The regular expression rgx1 can be broken down as follows:
(?<=\bgroups=) # Positive lookbehind asserts that the current
# position in the string is preceded by`'groups'`,
# which is preceded by a word boundary
.* # match zero of more characters other than line
# terminators (to end of line)
rgx2 can be broken down as follows:
(?<=\() # Use a positive lookbehind to assert that the
# following character is preceded by '('
[^\)\r\n]+ # Match one or more characters other than
# ')', '\r' and '\n'
Note:
The operations can of course be chained: str[/(?<=\bgroups=).*/].scan(/(?<=\()[^\)\r\n]+/); and
rgx2 could alternatively be written /(?<=\().+?(?=\)), where ? makes the match of one or more characters lazy and (?=\)) is a positive lookahead that asserts that the match is followed by a right parenthesis.
This would probably be the fastest solution of those offered and certainly the easiest to test.

Regex using negative lookahead missing first character of group2

I need to get the LDAP group names from this example string:
"user.ldap.groups.name" = "M-Role13" AND ("user.ldap.groups.name"= "M Role1" OR "user.ldap.groups.name" = "M.Group-Role16" OR "user.ldap.groups.name"="Admin Role" ) AND "common.platform" = "iOS" AND ( AND "ios.PersonalHotspotEnabled" = true ) AND "common.retired" = False
I'm using this regex to match the parts of the string that contains an LDAP group
("user\.ldap\.groups\.name"?.=.?".+?(.*?)")(?!"user\.ldap\.groups\.name")
but it is matching in group2 the name without the first character.
https://regex101.com/r/2Aby6K/1
A few notes about the pattern you tried
The reason it misses the first character is because this part .+? requires at least a single character
Note that in this part "?.=.?" it matches an optional ", an equals sign between any char due to the dot where the second dot is optional and then "
This part (.*?)")(?!"user\.ldap\.groups\.name") uses a non greedy dot .*? which will give up as least as possible to satisfy the condition to match a " which is not directly followed by user.ldap.groups.name. See an example of an incorrect match.
What you might do is use a negated character class
"user\.ldap\.groups\.name"\s*=\s*"([^"]+)"
In parts
"user\.ldap\.groups\.name" Match
\s*=\s* Match = between 0+ whitespace chars on the left and right
"( Match " and start capturing group
[^"]+ Match any char except " 1+ times
)" Close group and match "
Regex demo
Or if you want to include the negative lookahead:
"user\.ldap\.groups\.name"\s*=\s*"([^"]+)"(?!"user\.ldap\.groups\.name")
Regex demo

REGEX input validation

I am trying to put together REGEX expression to validate the following format:
"XXX/XXX","XXX/XXX","XXX/XXX"
where X could be either a letter, a number, or dash or underscore. What i got so far is
"(.*?)(\/)(.*?)"(?:,|$)/g
but it does not seem to work
Update: there could be any number of "XXX/XXX" strings, comma-separated, not just 3
you can try the following regex:
"([\w-]+)\/([\w-]+)"
Edit: regex explained:
([\w-]+) in the square brackets we say we want to match \w: matches any word character (equal to [a-zA-Z0-9_]). After this, we have "-", which just adds literally the symbol "-" to the matching symbols.
"+" says we want one or more symbols from the previous block: [\w-]
\/ matches the symbol "/" directly. It should be escaped in the regex, that's why it is preceded by "\"
([\w-]+) exactly like point 1, matches the same thing since the two parts are identical.
() - those brackets mark capturing group, which you can later use in your code to get the value it surrounds and matches.
Example:
Full match: 1X-/-XX
Group 1: 1X-
Group 2: -XX
Here is a demo with the matching cases - click. If this doesn't do the trick, let me know in the comments.
This will do the job:
"[-\w]+/[-\w]+"(?:,"[-\w]+/[-\w]+")*
Explanation:
" # quote
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
/ # a slash
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
" # quote
(?: # non capture group
, # a comma
" # quote
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
/ # a slash
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
" # quote
)* # end group, may appear 0 or more times
Demo
Here, we would be starting with a simple expression with quantifiers:
("[A-Za-z0-9_-]+\/[A-Za-z0-9_-]+")(,|$)
where we collect our desired three chars in a char class, followed by slash and at the end we would add an optional ,.
Demo
RegEx Circuit
jex.im visualizes regular expressions:

Can't find upper case letter in URL using Regex

I have the following regex:
(href[\s]?=[\s]?)(\"[^"]*\/*[^"]*\")
using the following Test String:
href="http://mysite.io/Plan-documents"
I get two capturing groups. One with the href= and the other is everything past that. Now I want to only display matches where there is an uppercase letter anywhere in the second capture group. I tried:
(href[\s]?=[\s]?)(\"[A-Z]*[^"]*\/*[^"]*\")
to try and only have this regex come back with URL's that have uppercase in them. No luck. Regardless if I modify the test string as:
href="http://mysite.io/plan-documents"
I still get a match. I only want to match on the href string if there any at least one uppercase in the string past the href=.
Thanks.
You don't get the right matches because in your second capturing group all what is between double quotes uses a quantifier * which matches 0 or more times.
First the engine matches 0+ times [A-Z]*. It is not present but it is ok, because of the 0+ times quantifier. Then the next part [^"]* will match until right before it encounters the next "
The following \/* is not there but is also ok because of the 0+ times quantifier followed by [^"]* which is also ok.
What you might do instead is first match not an uppercase until you match an uppercase and then match until the closing double quotes.
(href\s?=\s?)("[^A-Z\s]*[A-Z][^\s"]*")
Explanation
(href\s?=\s?) Capture group, match href= surrounded by optional whitespace char
(" Start capture group and match "
[^A-Z\s]* Match 0+ times not an uppercase or whitespace char
[A-Z] Match 1 uppercase char
[^"\s]* Match 0+ times not " or a whitespace char
") Match " and close capture group
Regex demo
Without using groups, you could use:
href\s?=\s?"[^A-Z\s]*[A-Z][^\s"]*"
Regex demo