Regex to fail if multiple matches found - regex

Take the following regex:
P[0-9]{6}(\s|\.|,)
This is designed to check for a 6 digit number preceded by a "P" within a string - works fine for the most part.
Problem is, we need the to fail if more than one match is found - is that possible?
i.e. make Text 4 in the following screenshot fail but still keep all the others failing / passing as shown:
(this RegEx is being executed in a SQL .net CLR)

If the regex engine used by this tool is indeed the .NET engine, then you can use
^(?:(?!P[0-9]{6}[\s.,]).)*P[0-9]{6}[\s.,](?:(?!P[0-9]{6}[\s.,]).)*$
If it's the native SQL engine, then you can't do it with a single regex match because those engines don't support lookaround assertions.
Explanation:
^ # Start of string
(?: # Start of group which matches...
(?!P[0-9]{6}[\s.,]) # unless it's the start of Pnnnnnn...
. # any character
)* # any number of times
P[0-9]{6}[\s.,] # Now match Pnnnnnn exactly once
(?:(?!P[0-9]{6}[\s.,]).)* # Match anything but Pnnnnnn
$ # until the end of the string
Test it live on regex101.com.

or use this pattern
^(?!(.*P[0-9]{6}[\s.,]){2})(.*P[0-9]{6}[\s.,].*)$
Demo
basically check if the pattern exists and not repeated twice.
^ Start of string
(?! Negative Look-Ahead
( Capturing Group \1
. Any character except line break
* (zero or more)(greedy)
P "P"
[0-9] Character Class [0-9]
{6} (repeated {6} times)
[\s.,] Character Class [\s.,]
) End of Capturing Group \1
{2} (repeated {2} times)
) End of Negative Look-Ahead
( Capturing Group \2
. Any character except line break
* (zero or more)(greedy)
P "P"
[0-9] Character Class [0-9]
{6} (repeated {6} times)
[\s.,] Character Class [\s.,]
. Any character except line break
* (zero or more)(greedy)
) End of Capturing Group \2
$ End of string

Related

Regex to capture optional characters

I want to pull out a base string (Wax) or (noWax) from a longer string, along with potentially any data before and after if the string is Wax. I'm having trouble getting the last item in my list below (noWax) to match.
Can anyone flex their regex muscles? I'm fairly new to regex so advice on optimization is welcome as long as all matches below are found.
What I'm working with in Regex101:
/(?<Wax>Wax(?:Only|-?\d+))/mg
Original string
need to extract in a capturing group
Loc3_341001_WaxOnly_S212
WaxOnly
Loc4_34412-a_Wax4_S231
Wax4
Loc3a_231121-a_Wax-4-S451
Wax-4
Loc3_34112_noWax_S311
noWax
Here is one way to do so, using a conditional:
(?<Wax>(no)?Wax(?(2)|(?:Only|-?\d+)))
See the online demo.
(no)?: Optional capture group.
(? If.
(2): Test if capture group 2 exists ((no)). If it does, do nothing.
|: Or.
(?:Only|-?\d+)
I assume the following match is desired.
the match must include 'Wax'
'Wax' is to be preceded by '_' or by '_no'. If the latter 'no' is included in the match.
'Wax' may be followed by:
'Only' followed by '_', in which case 'Only' is part of the match, or
one or more digits, followed by '_', in which case the digits are part of the match, or
'-' followed by one or more digits, followed by '-', in which case
'-' followed by one or more digits is part of the match.
If these assumptions are correct the string can be matched against the following regular expression:
(?<=_)(?:(?:no)?Wax(?:(?:Only|\d+)?(?=_)|\-\d+(?=-)))
Demo
The regular expression can be broken down as follows.
(?<=_) # positive lookbehind asserts previous character is '_'
(?: # begin non-capture group
(?:no)? # optionally match 'no'
Wax # match literal
(?: # begin non-capture group
(?:Only|\d+)? # optionally match 'Only' or >=1 digits
(?=_) # positive lookahead asserts next character is '_'
| # or
\-\d+ # match '-' followed by >= 1 digits
(?=-) # positive lookahead asserts next character is '-'
) # end non-capture group
) # end non-capture group

Parenthesis content after a specific word

I'm trying to get UNIX group names using a regex (can't use groups because I can only get the process uid, so I'm using id <process_id> to get groups)
input looks like this
uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n
I'd like to capture kawsay, sudo, video and gpio
The only pieces I've got are:
a positive lookbehind to start capturing after groups: /(?<=groups)/
capture the parenthesis content: /\((\w+)\)/
Using PCRE's \G you may use this regex:
(?:\bgroups=|(?<!^)\G)[^(]*\(([^)]+)\)
Your intended matches are available in capture group #1
RegEx Demo
RegEx Details:
(?:: Start non-capture group
\bgroups=: Match word groups followed by a =
|: OR
(?<!^)\G: Start from end position of the previous match
): End non-capture group
[^(]*: Match 0 or more of any character that is not (
\(: Match opening (
([^)]+): Use capture group #1 to match 1+ of any non-) characters
\): Match closing )
You can use
(?:\G(?!\A)\),|\bgroups=)\d+\(\K\w+
See the regex demo. Details:
(?:\G(?!\A)\),|\bgroups=) - either of
\G(?!\A)\), - end of the previous match (\G operator matches either start of string or end of the previous match, so the (?!\A) is necessary to exclude the start of string location) and then ), substring
| - or
\bgroups= - a whole word groups (\b is a word boundary) and then a = char
\d+\( - one or more digits and a (
\K - match reset operator that makes the regex engine "forget" the text matched so far
\w+ - one or more word chars.
Here are two more ways to extract the strings of interest. Both return matches and do not employ capture groups. My preference is for second one.
str = "uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n"
Match substrings between parentheses that are not followed later in the string with "groups="
Match the regular expression
rgx = /(?<=\()(?!.*\bgroups=).*?(?=\))/
str.scan(rgx)
#=> ["kawsay", "sudo", "video", "gpio"]
Demo
See String#scan.
This expression can be broken down as follows.
(?<=\() # positive lookbehind asserts previous character is '('
(?! # begin negative lookahead
.* # match zero or more characters
\bgroups= # match 'groups=' preceded by a word boundary
) # end negative lookahead
.* # match zero or more characters lazily
(?=\)) # positive lookahead asserts next character is ')'
This may not be as efficient as expressions that employ \G (because of the need to determine if 'groups=' appears in the string after each left parenthesis), but that may not matter.
Extract the portion of the string following "groups=" and then match substrings between parentheses
First, obtain the portion of the string that follows "groups=":
rgx1 = /(?<=\bgroups=).*/
s = str[rgx1]
#=> "1001(kawsay),27(sudo),44(video),997(gpio)\n"
See String#[].
Then match the regular expression
rgx2 = /(?<=\()[^\)\r\n]+/
against s:
s.scan(rgx2)
#=> ["kawsay", "sudo", "video", "gpio"]
The regular expression rgx1 can be broken down as follows:
(?<=\bgroups=) # Positive lookbehind asserts that the current
# position in the string is preceded by`'groups'`,
# which is preceded by a word boundary
.* # match zero of more characters other than line
# terminators (to end of line)
rgx2 can be broken down as follows:
(?<=\() # Use a positive lookbehind to assert that the
# following character is preceded by '('
[^\)\r\n]+ # Match one or more characters other than
# ')', '\r' and '\n'
Note:
The operations can of course be chained: str[/(?<=\bgroups=).*/].scan(/(?<=\()[^\)\r\n]+/); and
rgx2 could alternatively be written /(?<=\().+?(?=\)), where ? makes the match of one or more characters lazy and (?=\)) is a positive lookahead that asserts that the match is followed by a right parenthesis.
This would probably be the fastest solution of those offered and certainly the easiest to test.

Capture repetitive pattern only if the string/line starts or contains with a specific char before the group

Hello I am trying to capture a repetitive pattern x.x.x.x where x is a number (one digit or more) but to capture only if the whole line or string starts or contains a "#" before the group:
This is a test # 1.2.3.4 which 11.2.38.49 should pass 3.2.4.5 and capture the 3 groups
This # is a test 1.2.3.4 which 1.2.3.4 should pass 3.2.4.5 too
#This is a test 1.2.3.4 which 1.2.3.4 should pass 3.2.4.5 too
This is a test 1.2.3.4 which # 1.2.3.4 pass 3.2.4.5 but don't capture the first group
This test 1.2.3.4 1.2.3.4 should # pass but capture nothing
This test 1.2.3.4 1.2.3.4 should test fail
so far I have (\d*\.\d+)+ is it even possible to find a regex for it?
https://regex101.com/r/G81BBj/1
There are different regex engines where some support different features than others.
If you want to match all occurrences of a dot-separated chunk, you could make use of a quantifier in a lookbehind assertion.
Match 1+digits, and repeat matching a dot and 1+ digits at least
one or more times to prevent matching only digits.
(?<=#.*)\d+(?:\.\d+)+
(?<=#.*) Positive lookbehind, assert that there is an # on the left
\d+ Match 1+ digits
(?: Non capture group
\.\d+ Match a dot and 1+ digits
)+ Close group and repeat 1+ times to make sure there is at least 1 dot present
.NET regex demo
Another option could be when the engine supports the \G anchor which will assert either at the start of the string, or asserts the position at the end of the previous match.
(?:^[^\r\n#]*#|\G(?!^)).*?(\d+(?:\.\d+)+)
(?: Non capture group
^[^\r\n#]*# Match from the start of the string until the first #
| Or
\G(?!^) Assert the position at the end of the previous match, not at the start
) Close group
.*? Match as least char as possible
( Capture group 1
\d+(?:\.\d+)+ Match a dot-separated chunk
) Close group 1
Regex demo
If the engine does not support \G and the engine will recognize \G as a G char, it will first try to match until the first occurrence of an # followed by matching until the dot-separated chunk.
After matching the first dot-separated chunk, it tries for all the following positions to match the first part of the alternation, which can not match due to the ^ anchor. It tries the second part, but that will not match because there is no G char in the example data so eventually there will be only a single match.
If \K to is also supported to clear the starting point of the reported match, for example in pcre, you could omit the capturing group and get the match only:
(?:^[^\r\n#]*#|\G(?!^)).*?\K\d+(?:\.\d+)+
Regex demo
With any regex engine that supports unknown width patterns in lookbehind (.NET, ECMAScript 2018+, PyPi regex module in Python, JGSoft Software), you may use
(?<=#.*?)\d+(?:\.\d+)*
See the regex demo.
Details
(?<=#.*?) - a location that is immediately preceded with # char that may be followed with any 0 or more chars other than line break chars, as few as possible
\d+(?:\.\d+)* - one or more digits followed with 0 or more occurrences of a dot and then 1+ digits.

What is this regex doing?

/^([a-z]:)?\//i
I don't quite understand what the ? in this regex if I had to explain it from what I understood:
Match begin "Group1 is a to z and :" outside ? (which I don't get what its doing) \/ which makes it match / and option /i "case insensitive".
I realize that this will return 0 or 1 not quiet sure why because of the ?
Is this to match directory path or something ?
If I test it:
$var = 'test' would get 0 while $var ='/test'; would get 1 but $var = 'test/' gets 0
So anything that begins with / will get 1 and anything else 0.
Can someone explain this regex in elementary terms to me?
See YAPE::Regex::Explain:
#!/usr/bin/perl
use strict; use warnings;
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/^([a-z]:)?\//i)->explain;
The regular expression:
(?i-msx:^([a-z]:)?/)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?i-msx: group, but do not capture (case-insensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1 (optional
(matching the most amount possible)):
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
)? end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
It matches a lower- or upper case letter ([a-z] with the i modifier) positioned at the start of the input string (^) followed by a colon (:) all optionally (?), followed by a forward slash \/.
In short:
^ # match the beginning of the input
( # start capture group 1
[a-z] # match any character from the set {'A'..'Z', 'a'..'z'} (with the i-modifier!)
: # match the character ':'
)? # end capture group 1 and match it once or none at all
\/ # match the character '/'
? will match one or none of the preceding pattern.
? Match 1 or 0 times
See also: perldoc perlre
Explanation:
/.../i # case insensitive
^(...) # match at the beginning of the string
[a-z]: # one character between 'a' and 'z' followed by a colon
(...)? # zero or one time of the group, enclosed in ()
So in english: Match anything which begins with a / (slash) or some letter followed by a colon followed by a /. This looks like it matches pathnames across unix and windows, e.g.
it would match:
/home/user
and
C:/Applications
etc.
It looks like it is looking for a "rooted" path. It will successfully match any string that either starts with a forward slash (/test), or a drive letter followed by a colon, followed by a forward slash (c:/test).
Specifically, the question mark makes something optional. It applies to the part in parentheses, which is a letter followed by a colon.
Things that will match:
C:/
a:/
/
(That last item above is why the ? is there)
Things that will not match:
C:
a:
ab:/
a/

Check the specific number of occurrences of a single character in a string with regex

I'm trying to create a regex pattern for my powershell code. I've never worked with regex before, so I'm a total noob.
The regex should check if there are two points in the string.
Examples that SHOULD work:
3.1.1
5.10.12
10.1.15
Examples that SHOULD NOT work:
3
3.1
5.10.12.1
The string must have two points in it, the number of digits doesn't matter.
I've tried something like this, but it doesn't really work and I think its far from the right solution...
([\d]*.[\d]*.[\d])
In your current regex I think you could escape the dot \. or else the dot would match any character.
You could add anchors for the start ^ and the end $ of the string and update your regex to ^\d*\.\d*\.\d*$
That would also match ..4 and ..
Or if you want to match one or more digits, I think you could use ^\d+(?:\.\d+){2}$
That would match
^ # From the beginning of the string
\d+ # Match one or more digits
(?: # Non capturing group
\.\d+ # Match a dot and one or more ditits
){2} # Close non capturing group and repeat 2 times
$ # The end of the string
Use a lookahead:
^\d(?=(?:[^.]*\.[^.]*){2}$)[\d.]*$
Broken down, this says:
^ # start of the line
\d # at least one digit
(?= # start of lookahead
(?:[^.]*\.[^.]*){2} # not a dot, a dot, not a dot - twice
$ # anchor it to the end of the string
)
[\d.]* # only digits and dots, 0+ times
$ # the end of the string
See a demo on regex101.com.