error parsing regexp invalid or unsupported Perl syntax: `(?!` - regex

I'm validating phone number and email using this regex but I'm getting perl syntax error can anyone help me what to use here
^(?:(\d)(?!\1{2}))\d{4,15}$|([A-Za-z0-9]+#[A-za-z]+\.[A-Za-z]{2,3})
I'm validating international numbers between 4-15 and also validating continuously repeated numbers like 1111111111111, 99999999999, 77777777777 we can't use more than 3 repeated numbers also I'm validating email everything is fine but for the repeated number I've to use Perl syntax ?! that's why I'm getting error please help me to solve this

You're not using Perl; you're using RE2. While similar to Perl, it's not quite compatible.
Specifically, it can't handle the pattern you provided. That's what the message is saying. You'll need to provide something RE2 can handle.
The following is the relevant part:
^(?:(\d)(?!\1{2}))\d{4,15}$
In Perl, that checks for a string of 5-16 digits that's optionally followed by line feed, with the caveat that the first three digits can't be the same.
This is equivalent[1] and will work in RE2:
^
(?: 0 (?: 0 [1-9] | [1-9] [0-9] )
| 1 (?: 1 [02-9] | [02-9] [0-9] )
| 2 (?: 2 [0-13-9] | [0-13-9] [0-9] )
| 3 (?: 3 [0-24-9] | [0-24-9] [0-9] )
| 4 (?: 4 [0-35-9] | [0-35-9] [0-9] )
| 5 (?: 5 [0-46-9] | [0-46-9] [0-9] )
| 6 (?: 6 [0-57-9] | [0-57-9] [0-9] )
| 7 (?: 7 [0-68-9] | [0-68-9] [0-9] )
| 8 (?: 8 [0-79] | [0-79] [0-9] )
| 9 (?: 9 [0-8] | [0-8] [0-9] )
)
[0-9]{2,13}
\n?
\z
I don't know RE2, so there might a better solution.
Assuming \d was meant to match [0-9]. It actually matches a whole lot more.

Related

Matching known hosts warning in regex

How could I match the following where the IP address can change:
Warning: Permanently added '100.124.61.161' (RSA) to the list of known hosts.
Thanks in advance!
You can try the below code, change the string to restrict only specific texts.
if($string =~ m/Warning: Permanently added '(.*?)' \(RSA\) to the list of known hosts\./)
{
print "Match Successful, IP address: $1\n";
}
else
{
print "String did not match\n";
}
A general regex for the ipv4 (no port) would be
(?<!\d)(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])(?:\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])){3}(?!\d)
Explained
(?<! \d )
(?:
\d # 0 - 9
| [1-9] \d # 10 - 99
| 1 \d{2} # 100 - 199
| 2 [0-4] \d # 200 - 249
| 25 [0-5] # 250 - 255
)
(?:
\.
(?:
\d
| [1-9] \d
| 1 \d{2}
| 2 [0-4] \d
| 25 [0-5]
)
){3}
(?! \d )

Using VBA Regular Expressions to identify strings that do not contain a specific word

Using VBA Regular Expresions, I'm trying to find cells in Excel that contain the string "CK" and that do not contain the string "AS", but it keeps blocking any string starting with either A or S. I'm using the following expresion:
pattern1 = "^[^AS](.*)CK(.*)"
I've seen negative lookahead being suggested, but none of the suggested expresions that I've seen (mostly for other programming languages) have not worked with VBA-Excel.
Any hints?
Without assertions it could be done like this (but it's tough):
'the idea is that it should contain CK but it should not contain AS in the very beginning'
^(?:A[^SC].*CK|AC[^K].*CK|ACK|CK|[^A].*CK).*$
Formatted
^
(?:
A [^SC] .* CK
|
AC [^K] .* CK
|
ACK
|
CK
|
[^A] .* CK
)
.*
$
Or, AS not anywhere in the string
^(?:[^A]|A[^S])*(?:ACK|CK)(?:[^A]|A(?:[^S]|$))*$
Formatted
^
(?:
[^A]
|
A
[^S]
)*
(?:
ACK
|
CK
)
(?:
[^A]
|
A
(?: [^S] | $ )
)*
$
Try this pattern
^(?!AS).*CK.*
(?!AS) is a negative lookahead for AS, in which (?! ) is provided by VBA for negative lookaheads. Note that lookbehinds are not provided by VBA, by comparison. Read more:
https://www.regular-expressions.info/vbscript.html
https://www.regular-expressions.info/lookaround.html#lookahead

Regex for specifying a date format

I have defined the following regex for a specific date:
(0[1-9]|1[012]|[1-9])[\/-]
(0[1-9]|1[0-9]|2[0-9]|3[0]|[1-9])[\/-]
(18[0-9]+|19[0-9]+|20[0-9]+|0[1-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|6[0-9]|7[0-9]|8[0-9]|9[0-9])
First line defines the month, second line the date and third year formats.
I am good with the limits for dates, months and years but I dont know how to reject mixed formats like mm/dd-yyyy or mm-dd/yyyy.
Can someone please help??
You can match the first delimiter, then use a back reference to it.
# /(0[1-9]|1[012]|[1-9])([\/-])(0[1-9]|1[0-9]|2[0-9]|3[0]|[1-9])\2(18[0-9]+|19[0-9]+|20[0-9]+|0[1-9]|[1-9][0-9])/
( 0 [1-9] | 1 [012] | [1-9] ) # (1), Month
( [/-] ) # (2), Delimiter / or -
( # (3 start), Day
0 [1-9]
| 1 [0-9]
| 2 [0-9]
| 3 [0]
| [1-9]
) # (3 end)
\2 # Delimiter backreference
( # (4 start), Year
18 [0-9]+
| 19 [0-9]+
| 20 [0-9]+
| 0 [1-9]
| [1-9] [0-9]
) # (4 end)

Can I use the lookahead assertion with an or operator in regex?

I'm writing a program to find who a book was printed for. I am given an imprint line and I have to extract the names. Note that each imprint line does not contain X amount of people, meaning the book can be written for one or multiple people.
Here is an example of an imprint line:
"[[London] : Finished in Ianuarie 1587, and the 29 of the Queenes Maiesties reigne, with the full continuation of the former yeares, for Iohn Harison, George Bishop, Rafe Newberie, Henrie Denham, and Thomas VVoodcocke. At London printed [by Henry Denham] in Aldersgate street at the signe of the Starre,"
I have a regex that will match "Iohn Harison, George Bishop, Rafe Newberie, Henrie Denham, and Thomas Woodcock. At London" in the above line.
The problem is: The way the regex is coded it will match the next sentence because it will start with a capital, which will be matched by the name regex. Also I cannot just search for a period because there can be a list of initials: J.D., K.G., & V.X.
The string name will basically match any format a name can be in.
name will match: ( John | John Day | John Wayne Day| John-Day | J.D. | John | J. | J.D | .J.D. | mcJohn Day) and each name must contain a capital letter, and a name can be composed of multiple names.
Here is the current code:
string line = imprint_line;
string name("(\\s[a-z]*[A-Z\\.]+[a-z\\.:-]*)+");
regex reg("[Ff]or"+name+"((,|,?\\sand|\\s&)?"+name+")*");
smatch matches;
if (regex_search(line, matches, reg))
printedFor = matches[0];
I want to change reg to lookahead for , or and or & or , and
I was trying something like this:
regex reg("[Ff]or"+name+"(?=(,|,?\\sand|,?\\s&))"+name+")*");
but this return a regex error. Is there someway I can do this?
Thanks in advance for all the help.
This is your current regex cleaned up a bit.
I can't figure out why you need the lookahead though.
Can you explain better?
[Ff] or
(?: \s [a-z]* [A-Z.]+ [a-z.:-]* )+
(?:
(?: , | ,? \s and | \s & )?
(?: \s [a-z]* [A-Z.]+ [a-z.:-]* )+
)*
Here is the error you are getting
[Ff] or
(?:
\s [a-z]* [A-Z.]+ [a-z.:-]*
)+
(?= , | ,? \s and | ,? \s & )
(?:
\s [a-z]* [A-Z.]+ [a-z.:-]*
)+
= ) <-- Unbalanced ')'
*

Perl Regular expression for IP address range

I have some internet traffic data to analyze. I need to analyze only those packets that are within a certain IP range. So, I need to write a if statement. I suppose I need a regular expression for the test condition. My knowledge of regexp is a little weak. Can someone tell me how would I construct a regular expression for that condition. An example range may be like
Group A
56.286.75.0/19
57.256.106.0/21
64.131.14.0/22
Group B
58.176.44.0/21
58.177.92.0/19
The if statement would be like
if("IP in A" || "IP in B") {
do something
}
else { do something else }
so i would need to make the equivalent regexp for "IP in A" and "IP in B" conditions.
I don't think that regexps provide much advantage for this problem.
Instead, use the Net::Netmask module. The "match" method should do what you want.
I have to echo the disagreement with using a regex to check IP addresses...however, here is a way to pull IPs out of text:
qr{
(?<!\d) # No digit having come immediately before
(?: [1-9] \d? # any one or two-digit number
| 1 \d \d # OR any three-digit number starting with 1
| 2 (?: [0-4] \d # OR 200 - 249
| 5 [0-6] # OR 250 - 256
)
)
(?: \. # followed by a dot
(?: [1-9] \d? # 1-256 reprise...
| 1 \d \d
| 2 (?: [0-4 \d
| 5 [0-6]
)
)
){3} # that group exactly 3 times
(?!\d) # no digit following immediately after
}x
;
But given that general pattern, we can construct an IP parser. But for the given "ranges", I wouldn't do anything less than the following:
A => qr{
(?<! \d )
(?: 56\.186\. 75
| 57\.256\.106
| 64\.131\. 14
)
\.
(?: [1-9] \d?
| 1 \d \d
| 2 (?: [0-4] \d
| 5 [0-6]
)
)
(?! \d )
}x
B => qr{
(?<! \d )
58 \.
(?: 176\.44
| 177\.92
)
\.
(?: [1-9] \d?
| 1 \d \d
| 2 (?: [0-4] \d
| 5 [0-6]
)
)
(?! \d )
}x
I'm doing something like:
use NetAddr::IP;
my #group_a = map NetAddr::IP->new($_), #group_a_masks;
...
my $addr = NetAddr::IP->new( $ip_addr_in );
if ( grep $_->contains( $addr ), #group_a ) {
print "group a";
}
I chose NetAddr::IP over Net::Netmask for IPv6 support.
Martin is right, use Net::Netmask. If you really want to use a regex though...
$prefix = "192.168.1.0/25";
$ip1 = "192.168.1.1";
$ip2 = "192.168.1.129";
$prefix =~ s/([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\/([0-9]+)/$mask=(2**32-1)<<(32-$5); $1<<24|$2<<16|$3<<8|$4/e;
$ip1 =~ s/([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/$1<<24|$2<<16|$3<<8|$4/e;
$ip2 =~ s/([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/$1<<24|$2<<16|$3<<8|$4/e;
if (($prefix & $mask) == ($ip1 & $mask)) {
print "ip1 matches\n";
}
if (($prefix & $mask) == ($ip2 & $mask)) {
print "ip2 matches\n";
}