Exclude specific name in regex - regex

I have multiple directories with names like app1.6.11, app1.7.12, app1.8.34, test1, test2.
I want to match regex for all the directories which start with app and to exclude app1.8.34.
I have tried:
^(app.+)[^(app1.8.34)]

If you want to match just the dot, you should escape it \. it or else it would match any character.
You could use a negative lookahead:
^app(?!1\.8\.34).+$
That would match
^ # The beginning of the string
app # Match app
(?! # Negative lookahead that asserts what follows is not
1\.8\.34 # Match 1.8.34
) # Close negative lookahead
.+ # Match any character one or more times
$ # End of the string

Related

Regex to capture optional characters

I want to pull out a base string (Wax) or (noWax) from a longer string, along with potentially any data before and after if the string is Wax. I'm having trouble getting the last item in my list below (noWax) to match.
Can anyone flex their regex muscles? I'm fairly new to regex so advice on optimization is welcome as long as all matches below are found.
What I'm working with in Regex101:
/(?<Wax>Wax(?:Only|-?\d+))/mg
Original string
need to extract in a capturing group
Loc3_341001_WaxOnly_S212
WaxOnly
Loc4_34412-a_Wax4_S231
Wax4
Loc3a_231121-a_Wax-4-S451
Wax-4
Loc3_34112_noWax_S311
noWax
Here is one way to do so, using a conditional:
(?<Wax>(no)?Wax(?(2)|(?:Only|-?\d+)))
See the online demo.
(no)?: Optional capture group.
(? If.
(2): Test if capture group 2 exists ((no)). If it does, do nothing.
|: Or.
(?:Only|-?\d+)
I assume the following match is desired.
the match must include 'Wax'
'Wax' is to be preceded by '_' or by '_no'. If the latter 'no' is included in the match.
'Wax' may be followed by:
'Only' followed by '_', in which case 'Only' is part of the match, or
one or more digits, followed by '_', in which case the digits are part of the match, or
'-' followed by one or more digits, followed by '-', in which case
'-' followed by one or more digits is part of the match.
If these assumptions are correct the string can be matched against the following regular expression:
(?<=_)(?:(?:no)?Wax(?:(?:Only|\d+)?(?=_)|\-\d+(?=-)))
Demo
The regular expression can be broken down as follows.
(?<=_) # positive lookbehind asserts previous character is '_'
(?: # begin non-capture group
(?:no)? # optionally match 'no'
Wax # match literal
(?: # begin non-capture group
(?:Only|\d+)? # optionally match 'Only' or >=1 digits
(?=_) # positive lookahead asserts next character is '_'
| # or
\-\d+ # match '-' followed by >= 1 digits
(?=-) # positive lookahead asserts next character is '-'
) # end non-capture group
) # end non-capture group

Parenthesis content after a specific word

I'm trying to get UNIX group names using a regex (can't use groups because I can only get the process uid, so I'm using id <process_id> to get groups)
input looks like this
uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n
I'd like to capture kawsay, sudo, video and gpio
The only pieces I've got are:
a positive lookbehind to start capturing after groups: /(?<=groups)/
capture the parenthesis content: /\((\w+)\)/
Using PCRE's \G you may use this regex:
(?:\bgroups=|(?<!^)\G)[^(]*\(([^)]+)\)
Your intended matches are available in capture group #1
RegEx Demo
RegEx Details:
(?:: Start non-capture group
\bgroups=: Match word groups followed by a =
|: OR
(?<!^)\G: Start from end position of the previous match
): End non-capture group
[^(]*: Match 0 or more of any character that is not (
\(: Match opening (
([^)]+): Use capture group #1 to match 1+ of any non-) characters
\): Match closing )
You can use
(?:\G(?!\A)\),|\bgroups=)\d+\(\K\w+
See the regex demo. Details:
(?:\G(?!\A)\),|\bgroups=) - either of
\G(?!\A)\), - end of the previous match (\G operator matches either start of string or end of the previous match, so the (?!\A) is necessary to exclude the start of string location) and then ), substring
| - or
\bgroups= - a whole word groups (\b is a word boundary) and then a = char
\d+\( - one or more digits and a (
\K - match reset operator that makes the regex engine "forget" the text matched so far
\w+ - one or more word chars.
Here are two more ways to extract the strings of interest. Both return matches and do not employ capture groups. My preference is for second one.
str = "uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n"
Match substrings between parentheses that are not followed later in the string with "groups="
Match the regular expression
rgx = /(?<=\()(?!.*\bgroups=).*?(?=\))/
str.scan(rgx)
#=> ["kawsay", "sudo", "video", "gpio"]
Demo
See String#scan.
This expression can be broken down as follows.
(?<=\() # positive lookbehind asserts previous character is '('
(?! # begin negative lookahead
.* # match zero or more characters
\bgroups= # match 'groups=' preceded by a word boundary
) # end negative lookahead
.* # match zero or more characters lazily
(?=\)) # positive lookahead asserts next character is ')'
This may not be as efficient as expressions that employ \G (because of the need to determine if 'groups=' appears in the string after each left parenthesis), but that may not matter.
Extract the portion of the string following "groups=" and then match substrings between parentheses
First, obtain the portion of the string that follows "groups=":
rgx1 = /(?<=\bgroups=).*/
s = str[rgx1]
#=> "1001(kawsay),27(sudo),44(video),997(gpio)\n"
See String#[].
Then match the regular expression
rgx2 = /(?<=\()[^\)\r\n]+/
against s:
s.scan(rgx2)
#=> ["kawsay", "sudo", "video", "gpio"]
The regular expression rgx1 can be broken down as follows:
(?<=\bgroups=) # Positive lookbehind asserts that the current
# position in the string is preceded by`'groups'`,
# which is preceded by a word boundary
.* # match zero of more characters other than line
# terminators (to end of line)
rgx2 can be broken down as follows:
(?<=\() # Use a positive lookbehind to assert that the
# following character is preceded by '('
[^\)\r\n]+ # Match one or more characters other than
# ')', '\r' and '\n'
Note:
The operations can of course be chained: str[/(?<=\bgroups=).*/].scan(/(?<=\()[^\)\r\n]+/); and
rgx2 could alternatively be written /(?<=\().+?(?=\)), where ? makes the match of one or more characters lazy and (?=\)) is a positive lookahead that asserts that the match is followed by a right parenthesis.
This would probably be the fastest solution of those offered and certainly the easiest to test.

Regex to match all word starts with # or # for .net

is there any regex to detect word that starts with # or # for .net?
I tried ^[##] but it doesn't seem to work for special character like # or #
In your regex ^[##] you match a # or # from the start of the string using ^. You could omit the ^ and add \w+ to match on or more word characters.
[##]\w+
string pattern = #"[##]\w+";
If you want to match more than \w+ you could also add that to a character class and add what you want to match like for example [##][\w+$].

Regular expression: match upto 3rd whitespace

I have following line:
Data 5 in:out:40 Files
I want to match all the strings untill 3rd whitespace, So, in this case, I want to get back
Data 5 in:out:40
How about:
^(\S+\s+\S+\s+\S+)
Lets break this down:
^ # start from string beginning
( # match everything inside (begin)
\S+ # match all non-whitespace(s)
\s+ # whitespace(s)
\d+ # match all non-whitespace(s)
\s+ # whitespace(s)
\S+ # match all non-whitespace(s)
) # match everything inside (end)
You can test the regex in a debugger.

Check the specific number of occurrences of a single character in a string with regex

I'm trying to create a regex pattern for my powershell code. I've never worked with regex before, so I'm a total noob.
The regex should check if there are two points in the string.
Examples that SHOULD work:
3.1.1
5.10.12
10.1.15
Examples that SHOULD NOT work:
3
3.1
5.10.12.1
The string must have two points in it, the number of digits doesn't matter.
I've tried something like this, but it doesn't really work and I think its far from the right solution...
([\d]*.[\d]*.[\d])
In your current regex I think you could escape the dot \. or else the dot would match any character.
You could add anchors for the start ^ and the end $ of the string and update your regex to ^\d*\.\d*\.\d*$
That would also match ..4 and ..
Or if you want to match one or more digits, I think you could use ^\d+(?:\.\d+){2}$
That would match
^ # From the beginning of the string
\d+ # Match one or more digits
(?: # Non capturing group
\.\d+ # Match a dot and one or more ditits
){2} # Close non capturing group and repeat 2 times
$ # The end of the string
Use a lookahead:
^\d(?=(?:[^.]*\.[^.]*){2}$)[\d.]*$
Broken down, this says:
^ # start of the line
\d # at least one digit
(?= # start of lookahead
(?:[^.]*\.[^.]*){2} # not a dot, a dot, not a dot - twice
$ # anchor it to the end of the string
)
[\d.]* # only digits and dots, 0+ times
$ # the end of the string
See a demo on regex101.com.