Using \d{1,3} when creating regex to find IPs - regex

Why would one use {1,3} in \d{1,3} when catching an IP with grep? For example:
grep -Po 'inet addr:\K(?!127\.)\d{1,3}.\d{1,3}\.\d{1,3}\.\d{1,3}'
\K removes inet addr:, and (?!127\.), AFAIU, removes any address that starts with 127 (the loopback in that case), but what are the {1,3} after \d?
Clearly, we don't only want IP calsses that starts in 1 and end with 2 or 3 so the purpose there is unclear to me.
Note: inet addr: is part of the ifconfig Linux utility.

While writing the question I figured out the purpose: It means that in each class of the 4 classes, we will have not more than 3 digits.
Indeed in IPv4 (I don't know about IPv6) we have only 3 digits in each class.

You have answered your question yourself however note that for general IPv4 the regex that should be used is the following:
'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b'
^^^^^^^^^^^^^^^^
that you could adapt to remove the localhost one.
In your case, the grep will also fetch chains of digits that are not proper IPs (e.g. integers > 255)

Related

How to make a regular expression for IPv6 with prefix?

I created this very complex regular expression(RegEx101) for IPv4 and IPv6
((^\s*((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))\s*$)|(^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$))|(^\*$)
Below are three examples of data that can be checked by this regular expression.
2001:db8:abcd:0012:0000:0000:0000:0000 (ipv6)
0000:0000:2001:DB8:ABCD:12:: (condensed notation)
255.255.255.0 (ipv4)
but this regular expression does not work for IPv6 addresses with prefix.
For example:
2001:db8:abcd:0012::0/112
does not work.
How can this problem be fixed?
And if anybody in the future wants optional subnet masks for ipv4 as well as ipv6.
/((^\s*((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))\s*(\/(\d|1\d|2\d|3[0-2]))?$)|(^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3}(\/(\d{1,2}|1[0-1]\d|12[0-8]))?)|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$))|(^\*$)/g
https://regex101.com/r/lS4Cjo/1
My understanding is that you want to optionally match a 'prefix'(a number at the end of the address which is always preceded by a forward slash) p such that 1≤p≤128. Let's try breaking this up.
Optionally match the following block:
Match a forward slash /
Either match a two digit number
Or match a number between 100 and 119 (inclusive)
Or match a number a number between 120 and 128 (inclusive)
The above is equivalent to this regex: (\/(\d{1,2}|1[0-1]\d|12[0-8]))?.
https://regex101.com/r/5cBm5a/3

Why does this regexp for IPv4 doesn't work?

So this is the regex I've made:
^(([01]?\d{1,2})|(2(([0-4]\d)|(5[0-5])))\.){3}(([01]?\d{1,2})|(2(([0-4]\d)|(5[0-5]))))$
I have used several sites to break it down and it seems that it should work, but it doesn't. The desired result is to match any IPv4 - four numbers between 0 and 255 delimited by dots.
As an example, 1.1.1.1 won't give you a match.
The purpose of this question is not to find out a regex for IPv4 address, but to find out why this one, which seems correct, is not.
The literal . is only part of the 200-255 section of the capture group: railroad diagram.
Here's (([01]?\d{1,2})|(2([0-4]\d)|(5[0-5]))\.) formatted differently to help you spot the reason:
(
([01]?\d{1,2})
|
(2([0-4]\d)|(5[0-5])) \.
)
You're matching 0-199 or 200-255 with a dot. The dot is conditional on matching 200-255.
Additionally, as #SebastianProske pointed out, 2([0-4]\d)|(5[0-5]) matches 200-249 or 50-55, not 200-255.
You can fix your regex by adding capturing groups, but ultimately I would recommend not reinventing the wheel and using A) a pre-existing regex solution or B) parse the IPv4 address by splitting on dots. The latter method being easier to read and understand.
to fix yours up, just account for the "decimal" after each of the first three groups:
((2[0-4]\d|25[0-5]|[01]?\d{1,2})\.){3}(2[0-4]\d|25[0-5]|[01]?\d{1,2})
(*note that I reversed the order of the 2xx vs 1xx tests as well - prefer SPECIAL|...|NORMAL, or more restrictive first, when using alternations like this)
see it in action

Regex: How can I match third IPv4 address?

I'm a regex noob and for the life of me I can't figure out how to match the third IPv4 address on line that contains three IPv4 addresses.
The line in question:
ip route 214.25.48.547 255.255.255.255 16.48.75.46 name Chicago-VPN
The regex I have so far that matches one IP:
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
If I put a {3} at the end of it, it breaks. I think it has something to do with the spaces between the addresses but I can't figure out how to handle that. I need to capture the third address.
https://regex101.com/r/mN3cR6/1
You just need to add a multiline modifier to the code.
Your new code should be like this
/([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3})/g
See this demo https://regex101.com/r/mN3cR6/2
Try
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s?)+
This should match one, two, or three, or even more "IPs".
Or
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
for exactly 3.
Or
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s?){3}
for a shorter formula with some possible errors.
Note that the basic idea is problematic too, as it matches "999.999.999.999" when it is definitely not a valid IP address.
The following should match the third ip
(?:[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s){2}([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
It's possible to be more compact depending what language you're using - for instance in ruby
string.scan(/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/)[2]
would give you what you want. You could also collapse the multiple [0-9]{1,3}. instances using non matching groups and counts
The problem is, that the regex needs to not only contain the IPs but also the spaces between the IPs.
So adding a space into the repeated group should do the trick:
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} ){3}
If you don't want tat space in the final match, you make it non-greedy, using ?? (or *?):
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} ??){3}
Also note, that your regex matches more than just valid IPs. e.g. 999.999.999.999 would match nicely.
You are already matching all three IPs with that regex.
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
Match 1
214.25.48.547
Match 2
255.255.255.255
Match 3
16.48.75.46
You can test it here:
http://rubular.com/
The problem may be with how you are trying to access them.
In Ruby, your regex works perfectly:
regex = /([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/
"ip route 214.25.48.547 255.255.255.255 16.48.75.46 name Chicago-VPN".scan(regex)
=> [["214.25.48.547"], ["255.255.255.255"], ["16.48.75.46"]]

Replace multiple IPs in multiple files with sed HP-UX

Can anyone tell me how can i mass replace IPs in multiple files by 1 command? what does this sed command does?
sed 's/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/x.x.x.x/g' *
Really need help here. Thanks!
This sed does:
s/pattern1/pattern2/g
Replaces pattern1 with pattern2
[0-9]\{1,3\} = 1 to 3 digits from 0-9
\. means a single dot .
So in theory this should change all IP in all files to given IP x.x.x.x
* mean all files in this folder
So no original IP are left, so be careful with this.
PS this is not 100% working. Example this number 3452.343.13.34 (not IP) will be change to 3x.x.x.x
sed "s/\([12]\{0,1\}[0-9]\{0,1\}[0-9]\.\)\{3\}[12]\{0,1\}[0-9]\{0,1\}[0-9]/x.x.x.x/g"
but
If a number (digit) is before or after, it is ignored and consider internal part as IP
If number bigger than 255 and lower of 300 appear, they are still consider as IP
IP using a start with 0 are not include (like 120.008.099.234)
If those think count, a more complexe sed is to be build (cascade one I think) like
sed "s/.*/#&#/;s/\([^0-9.]\)\([012]\{0,1\}[0-9]\{0,1\}[0-9]\.\)\{3\}[12]\{0,1\}[0-9]\{0,1\}[0-9]\([^0-9.]\)/\1x.x.x.x\3/g;s/^#\(.*\)#$/\1/"
(still possible number between 255 and 300)

How does this Squid regex filter rule work?

On our Squid server, the admin has put on a new regex rule:
^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+
I know that it stands for IP address, but it allows all URLs to go through, only pinging external address has stopped. Also tunneling software like UltraSurf have stopped connecting to the server. Skype also is not getting connected.
Please explain how this works! Thanks.
I am not sure about your particular issue with the Squid server, but here is what the regex does:
[0-9]+ means "any digit one or more times", so it is matching a string that begins with a digit one or more times, followed by a dot, followed by a digit one or more times, followed by a dot, followed by a digit one or more times, followed by dot, followed by a digit one or more times.. then anything else. In essence, it is matching any IP address, so it wouldn't filter anything out. It will also match things that are not even valid IP addresses like 123456.123456.123456.123456 or 1.1.1.1 or 125.252.252.252asdf.
Paolo has explained the meaning of the Regex well! As mentioned, the Regex currently being used is too weak (or should I say too restrictive!)
If you want a much better Regex to match IP addresses, see this page.