Excluding specific values from pattern match [duplicate] - regex

This question already has answers here:
Regex for find All ip address except IP address starts with 172
(3 answers)
Closed 3 years ago.
The following regex captures IP addresses as well as DNS hostnames.
What I'd like is to add some IPs to ignore, such as 1.0.0.0 and 0.0.0.0 for example. I tried some negative lookahead without success.
[\w-]+(\.[\w-]+)+
for example :
www.google.com 255.255.255.255 1.0.0.0 stackoverflow.com 0.0.0.0
should match 3 out of 5 in that line
Any tips would be great.
edit : I tried this, which somewhat works but also filters out other values such as 1.1.1.1 for example
(?![1\.0\.0\.0]|[0\.0\.0\.0])[\w-]+(\.[\w-]+)+

To find IP's and domains while ignoring IP's 1.0.0.0 and 0.0.0.0 and
validation ov Ipv4 and domain contains at least a letter, all wrapped inside
a white space boundary is thisr :
(?<!\S)(?!0{0,2}[01](?:\.0{1,3}){3})(?:(?:0{0,2}\d|0?[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])(?:\.(?:0{0,2}\d|0?[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])){3}|(?=\S*[a-zA-Z])[\w-]+(?:\.[\w-]+)+)(?!\S)
https://regex101.com/r/ZPQS5K/1
Expanded
(?<! \S )
(?! # Not 0.0.0.0 or 1.0.0.0
0{0,2} [01]
(?: \. 0{1,3} ){3}
)
(?:
(?: # IP address
0{0,2} \d
| 0? [1-9] \d
| 1 \d{2}
| 2 [0-4] \d
| 25 [0-5]
)
(?:
\.
(?:
0{0,2} \d
| 0? [1-9] \d
| 1 \d{2}
| 2 [0-4] \d
| 25 [0-5]
)
){3}
| # or
(?= \S* [a-zA-Z] ) # At least a letter
[\w-]+ # Domain
(?: \. [\w-]+ )+
)
(?! \S )

Related

RegEx for validating a URL with optional ports

I am trying to get this regex dialed-in to validate whether a URL begins with https and if a port is supplied the only valid values are 443 or 5443. This regex is pretty close but not quite there.
^(https:\/\/)([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,})(:5{0,1}443)?(.)*
How do I solve this problem?
This is a mainstream URL validator that tests if it's between whitespace boundary's.
It only allows https device and the port numbers 5443 or 443.
(?<!\S)https://(?:\S+(?::\S*)?#)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))|localhost)(?::5?443)?(?:/[^\s]*)?(?!\S)
Readable version
(?<! \S )
https ://
(?:
\S+
(?: : \S* )?
#
)?
(?:
(?:
(?:
[1-9] \d?
| 1 \d\d
| 2 [01] \d
| 22 [0-3]
)
(?:
\.
(?: 1? \d{1,2} | 2 [0-4] \d | 25 [0-5] )
){2}
(?:
\.
(?:
[1-9] \d?
| 1 \d\d
| 2 [0-4] \d
| 25 [0-4]
)
)
| (?:
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)
(?:
\.
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)*
(?:
\.
(?: [a-z\u00a1-\uffff]{2,} )
)
)
| localhost
)
(?: : 5? 443 )?
(?: / [^\s]* )?
(?! \S )
You should append a / after this optional port group so it doesn't allow any digits before a /. Try using this regex,
^(https:\/\/)([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,})(:5?443)?\/\S*
Notice, I've also changed (:5{0,1}443)? to (:5?443)? and changed last .* to \S* so the url doesn't capture spaces as spaces in URL is not a valid thing. Besides that, you can also get rid of so many groups in your regex, unless you need them.
Regex Demo
Edit:
As you said in comments, that you want to match following URLs too,
https://example.com
https:example.com
https:example.com:443
you need to make \/\S* part optional by placing a ? after them. The modified regex becomes this, which will match above URLs.
^https:\/\/([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,})(:5?443)?(\/\S*)?
Demo with filepath part being optional
Your RegEx seems to work okay. You may try using this RegEx and add additional boundaries, just for safety, if you wish so:
^(https:\/\/)([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,}):(5443|443)?$
I only added a $ end char so that to bound your original expression from the right. You may just simply add a few port numbers, if you may have, in this capturing group:
(5443|443)
You can also remove unnecessary boundaries, if you wish.

Matching known hosts warning in regex

How could I match the following where the IP address can change:
Warning: Permanently added '100.124.61.161' (RSA) to the list of known hosts.
Thanks in advance!
You can try the below code, change the string to restrict only specific texts.
if($string =~ m/Warning: Permanently added '(.*?)' \(RSA\) to the list of known hosts\./)
{
print "Match Successful, IP address: $1\n";
}
else
{
print "String did not match\n";
}
A general regex for the ipv4 (no port) would be
(?<!\d)(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])(?:\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])){3}(?!\d)
Explained
(?<! \d )
(?:
\d # 0 - 9
| [1-9] \d # 10 - 99
| 1 \d{2} # 100 - 199
| 2 [0-4] \d # 200 - 249
| 25 [0-5] # 250 - 255
)
(?:
\.
(?:
\d
| [1-9] \d
| 1 \d{2}
| 2 [0-4] \d
| 25 [0-5]
)
){3}
(?! \d )

Regex for Networking, IPs, ranges, subnets, Cidr

Given an input like this, 56.1.2.3 56.1.2.4 255.255.255.254 56.1.2.7-9 56.5.1.1 to 56.5.1.7, I need a regex that can pick out the what I have in brackets [56.1.2.3] [56.1.2.4 255.255.255.254] [56.1.2.7-9] [56.5.1.1 to 56.5.1.7].
Here is what i have:
private static final String IP_Address = "\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\d";//56.1.2.3
private static final String IP_WithMask = "(\\d{1,3}.){3}(\\d{1,3})(?:\\s+[255])(\\d{1,3}.){3}(\\d{1,3})"; //56.1.2.3 255.255.255.254
private static final String IP_CIDR = "(\\d{1,3}.){3}(\\d{1,3})(?:\\s*/)(\\d{1,3})"; //56.1.2.3/24
private static final String IP_ADDRESS_Dash_Numeric_RANGE = "((\\d{1,3}.){3}(\\d{1,3})(?:\\s*-)(\\d{1,3}))";// 56.1.2.3-4
private static final String IP_ADDRESS_Dash_ADDRESS_RANGE = "((\\d{1,3}.){3}(\\d{1,3})(?:\\s*-\\s*)(\\d{1,3}.){3}(\\d{1,3}))";//56.1.2.3-56.1.2.5
private static final String IP_ADDRESS_To_Numeric_RANGE = "(\\d{1,3}.){3}(\\d{1,3})(?:\\s*[T|t][O|o]\\s*)(\\d{1,3})";//56.1.2.3 to 255
private static final String IP_ADDRESS_To_ADDRESS_RANGE = "((\\d{1,3}.){3}(\\d{1,3})(?:\\s*[T|t][O|o]\\s*)(\\d{1,3}.){3}(\\d{1,3}))";//56.1.2.3 to 56.1.3.5`
The Problem is that my regex can't pick out the difference between a single IP and the case of an IP followed by a mask (56.1.2.3 255.x.x.x). Same problem exists for the other types too.
I tested the below regex, works on all your cases.
See the comments in the formatted regex.
The way to decipher results is to test groups 3-6 for the type of IP.
All the components are captured, even the segment start.
Regex:
(\d{1,3}(?:\.\d{1,3}){2}\.(\d{1,3}))(?:(?:-|\s+to\s+)(\d{1,3}(?![\d\.]))|(?:-|\s*to\s+)(\d{1,3}(?:\.\d{1,3}){3})|\s+(25\d(?:\.\d{1,3}){3})|\s*/(\d{1,3}))?
Formatted (with this app):
( # (1), IP
\d{1,3}
(?: \. \d{1,3} ){2}
\.
( \d{1,3} ) # (2), From segment
)
(?:
(?: - | \s+ to \s+ )
( # (3), Dash/To segment
\d{1,3}
(?! [\d\.] )
)
|
(?: - | \s* to \s+ )
( # (4), Dash/To range
\d{1,3}
(?: \. \d{1,3} ){3}
)
|
\s+
( # (5), Mask
25 \d
(?: \. \d{1,3} ){3}
)
|
\s* /
( # (6), Port
\d{1,3}
)
)?

Splitting string having special characters, words, numbers and URL

I have a .txt file which contains:
"'the url address i checked is: https://www.google.com/ for 2times and it's awesome!."
After parsing, the expected output should be:
['"',"'",'the','url','address','i','checked','is',':','https://www.google.com/','for','2','times','and',"it's",'awesome','!','.','"']
How do I split this list to get the output using the re module.
I came up with this pattern:
pattern = re.compile(r"\d+|[a-zA-Z]+[a-zA-Z']*|[^\w\s]")
but this is also splitting my URL.
Can any one please help?
Just pick a url regex from somewhere and make it first in the alternations.
An example only -
# (?!mailto:)(?:(?:https?|ftp)://)?(?:\S+(?::\S*)?#)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))|localhost)(?::\d{2,5})?(?:/[^\s]*)?|\d+|[a-zA-Z]+[a-zA-Z']*|[^\w\s]
(?! mailto: )
(?:
(?: https? | ftp )
://
)?
(?:
\S+
(?: : \S* )?
#
)?
(?:
(?:
(?:
[1-9] \d?
| 1 \d\d
| 2 [01] \d
| 22 [0-3]
)
(?:
\.
(?: 1? \d{1,2} | 2 [0-4] \d | 25 [0-5] )
){2}
(?:
\.
(?:
[1-9] \d?
| 1 \d\d
| 2 [0-4] \d
| 25 [0-4]
)
)
| (?:
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)
(?:
\.
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)*
(?:
\.
(?: [a-z\u00a1-\uffff]{2,} )
)
)
| localhost
)
(?: : \d{2,5} )?
(?: / [^\s]* )?
| \d+
| [a-zA-Z]+ [a-zA-Z']*
| [^\w\s]
Outputs:
['"',"'",'the','url','address','i','checked','is',':','https://www.google.com/','for','2','times','and',"it's",'awesome','!','.','"']

Perl Regular expression for IP address range

I have some internet traffic data to analyze. I need to analyze only those packets that are within a certain IP range. So, I need to write a if statement. I suppose I need a regular expression for the test condition. My knowledge of regexp is a little weak. Can someone tell me how would I construct a regular expression for that condition. An example range may be like
Group A
56.286.75.0/19
57.256.106.0/21
64.131.14.0/22
Group B
58.176.44.0/21
58.177.92.0/19
The if statement would be like
if("IP in A" || "IP in B") {
do something
}
else { do something else }
so i would need to make the equivalent regexp for "IP in A" and "IP in B" conditions.
I don't think that regexps provide much advantage for this problem.
Instead, use the Net::Netmask module. The "match" method should do what you want.
I have to echo the disagreement with using a regex to check IP addresses...however, here is a way to pull IPs out of text:
qr{
(?<!\d) # No digit having come immediately before
(?: [1-9] \d? # any one or two-digit number
| 1 \d \d # OR any three-digit number starting with 1
| 2 (?: [0-4] \d # OR 200 - 249
| 5 [0-6] # OR 250 - 256
)
)
(?: \. # followed by a dot
(?: [1-9] \d? # 1-256 reprise...
| 1 \d \d
| 2 (?: [0-4 \d
| 5 [0-6]
)
)
){3} # that group exactly 3 times
(?!\d) # no digit following immediately after
}x
;
But given that general pattern, we can construct an IP parser. But for the given "ranges", I wouldn't do anything less than the following:
A => qr{
(?<! \d )
(?: 56\.186\. 75
| 57\.256\.106
| 64\.131\. 14
)
\.
(?: [1-9] \d?
| 1 \d \d
| 2 (?: [0-4] \d
| 5 [0-6]
)
)
(?! \d )
}x
B => qr{
(?<! \d )
58 \.
(?: 176\.44
| 177\.92
)
\.
(?: [1-9] \d?
| 1 \d \d
| 2 (?: [0-4] \d
| 5 [0-6]
)
)
(?! \d )
}x
I'm doing something like:
use NetAddr::IP;
my #group_a = map NetAddr::IP->new($_), #group_a_masks;
...
my $addr = NetAddr::IP->new( $ip_addr_in );
if ( grep $_->contains( $addr ), #group_a ) {
print "group a";
}
I chose NetAddr::IP over Net::Netmask for IPv6 support.
Martin is right, use Net::Netmask. If you really want to use a regex though...
$prefix = "192.168.1.0/25";
$ip1 = "192.168.1.1";
$ip2 = "192.168.1.129";
$prefix =~ s/([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\/([0-9]+)/$mask=(2**32-1)<<(32-$5); $1<<24|$2<<16|$3<<8|$4/e;
$ip1 =~ s/([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/$1<<24|$2<<16|$3<<8|$4/e;
$ip2 =~ s/([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/$1<<24|$2<<16|$3<<8|$4/e;
if (($prefix & $mask) == ($ip1 & $mask)) {
print "ip1 matches\n";
}
if (($prefix & $mask) == ($ip2 & $mask)) {
print "ip2 matches\n";
}