Why does this regexp for IPv4 doesn't work? - regex

So this is the regex I've made:
^(([01]?\d{1,2})|(2(([0-4]\d)|(5[0-5])))\.){3}(([01]?\d{1,2})|(2(([0-4]\d)|(5[0-5]))))$
I have used several sites to break it down and it seems that it should work, but it doesn't. The desired result is to match any IPv4 - four numbers between 0 and 255 delimited by dots.
As an example, 1.1.1.1 won't give you a match.
The purpose of this question is not to find out a regex for IPv4 address, but to find out why this one, which seems correct, is not.

The literal . is only part of the 200-255 section of the capture group: railroad diagram.
Here's (([01]?\d{1,2})|(2([0-4]\d)|(5[0-5]))\.) formatted differently to help you spot the reason:
(
([01]?\d{1,2})
|
(2([0-4]\d)|(5[0-5])) \.
)
You're matching 0-199 or 200-255 with a dot. The dot is conditional on matching 200-255.
Additionally, as #SebastianProske pointed out, 2([0-4]\d)|(5[0-5]) matches 200-249 or 50-55, not 200-255.
You can fix your regex by adding capturing groups, but ultimately I would recommend not reinventing the wheel and using A) a pre-existing regex solution or B) parse the IPv4 address by splitting on dots. The latter method being easier to read and understand.

to fix yours up, just account for the "decimal" after each of the first three groups:
((2[0-4]\d|25[0-5]|[01]?\d{1,2})\.){3}(2[0-4]\d|25[0-5]|[01]?\d{1,2})
(*note that I reversed the order of the 2xx vs 1xx tests as well - prefer SPECIAL|...|NORMAL, or more restrictive first, when using alternations like this)
see it in action

Related

Excluding 3dots additional to other characters with regex in a string

I have such an http-url detector regex:
(?:http|https)(?::\/{2}[\w]+)(?:[\/|\.]?)(?:[^\s<"]*)
It works pretty well for the following url representation:
http://www.acer.com/clearfi/download/
What kind of modification I can do to extract
http://schemas.microsoft.com/office/word/2003/wordml2450
from
Huanghhttp://schemas.microsoft.com/office/word/2003/wordml2450...)()()()()()
?
You can modify it to capture:
group of http stuff
followed by (group of) subdomain stuff
followed by as many as possible groups of:
one point or slash
followed by a group of characters (non-point, non-space, non-", non-<)
(?:http|https)(?:\/{2}[\w]+)([\/|\.][^\s<"\.]+)*
I made capturing groups to visualize the results
I've changed your expression here and there: (.*)(https?:\/{2}[\w]+[\/|\.]?[^\s<"]*)(\.{3}.*) and get only second capturing group from it. See example here: https://regex101.com/r/0viPC5/2
This expression probably can be simplified further but I don't know your exact input and search criteria so let's stick with what you already wrote.

Regex: split number into optional first group of up to three then last group of up to three

I have two 1-6 digit numbers separated by a slash. I want these split up into groups of at most 3 digits, taking from the right.
For example:
0/1 -> [,0,,1]
1234/3 -> [1,234,,3]
12345/1234 -> [12,345,1,234]
123456/789123 -> [123,456,789,123]
I need to use a regular expression to do this because I want to do this for a location in NGINX. It's possible to do this with application logic but that is not the question due to performance.
Similar question which solves part of this was here using a negative lookahead: Regular expression to match last number in a string
What regex can achieve this split?
UPDATE:
This regex comes close to what I want (https://regex101.com/r/bQtNdK/3):
(?<prefix1>\d{0,3}?)(?<threes1>\d{0,3})\/(?<prefix2>\d{0,3}?)(?=\d)(?<threes2>\d{0,3})
It fails matching if the second number behind the slash is more than 3 digits long.
UPDATE2:
Now this regex works for most combinations (https://regex101.com/r/bQtNdK/5):
(?<prefix1>\d{0,3}?)(?<threes1>\d{1,3})\/(?<prefix2>\d{0,3})(?<threes2>\d{3})
I don't understand why this starts to fail if I use the same regex for prefix2/threes2 like prefix1/threes1 (i.e. make prefix2 also lazy). Any ideas how to solve this? So close...
I don't know that it's possible without the ability for the regex engine to remember all intermediate matches of a match group that matched an arbitrary number of times (.NET can do this, not sure what others). PCRE will apparently only remember the 'last' match for each group, other wise you could use something like this : (?<prefix1>\d{0,2})(?:(?<threes1>\d{3})*)\/(?<prefix2>\d{0,2})(?<threes2>\d{3})*\s
This regex seems to be correct now (regex101):
(?<prefix1>\d{0,3}?)(?<suffix1>\d{1,3})\/(?<prefix2>\d{0,3}?)(?<suffix2>\d{1,3})\/

Regex: How can I match third IPv4 address?

I'm a regex noob and for the life of me I can't figure out how to match the third IPv4 address on line that contains three IPv4 addresses.
The line in question:
ip route 214.25.48.547 255.255.255.255 16.48.75.46 name Chicago-VPN
The regex I have so far that matches one IP:
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
If I put a {3} at the end of it, it breaks. I think it has something to do with the spaces between the addresses but I can't figure out how to handle that. I need to capture the third address.
https://regex101.com/r/mN3cR6/1
You just need to add a multiline modifier to the code.
Your new code should be like this
/([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3})/g
See this demo https://regex101.com/r/mN3cR6/2
Try
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s?)+
This should match one, two, or three, or even more "IPs".
Or
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
for exactly 3.
Or
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s?){3}
for a shorter formula with some possible errors.
Note that the basic idea is problematic too, as it matches "999.999.999.999" when it is definitely not a valid IP address.
The following should match the third ip
(?:[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s){2}([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
It's possible to be more compact depending what language you're using - for instance in ruby
string.scan(/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/)[2]
would give you what you want. You could also collapse the multiple [0-9]{1,3}. instances using non matching groups and counts
The problem is, that the regex needs to not only contain the IPs but also the spaces between the IPs.
So adding a space into the repeated group should do the trick:
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} ){3}
If you don't want tat space in the final match, you make it non-greedy, using ?? (or *?):
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} ??){3}
Also note, that your regex matches more than just valid IPs. e.g. 999.999.999.999 would match nicely.
You are already matching all three IPs with that regex.
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
Match 1
214.25.48.547
Match 2
255.255.255.255
Match 3
16.48.75.46
You can test it here:
http://rubular.com/
The problem may be with how you are trying to access them.
In Ruby, your regex works perfectly:
regex = /([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/
"ip route 214.25.48.547 255.255.255.255 16.48.75.46 name Chicago-VPN".scan(regex)
=> [["214.25.48.547"], ["255.255.255.255"], ["16.48.75.46"]]

Mod Rewrite RegEx To Match Only If Previous Subset Matched

I am trying to make what I think is a simple regex for use with mod_rewrite.
I've tried various expressions, many of which I thought were promising, but all of which ultimately failed for one reason or another. They all also seem to fail once I add start/end string delimiters.
For example, ^user/(\d{1,10})(?=/)$ was one I tried, but among other things, it seems to group the trailing slash, and I only want to group the digits. I think I need to use a positive lookbehind, but I'm having difficulty because it's looking behind at a group.
What I am trying to match is strings that 1) begin with "user/" and 2) possibly end with (\d{1,10})/ (1 to 10 digits followed by a single slash)
Should Match:
user/
user/123/
user/1234567890/
Should not match:
user
user//
user/-4/
user/35.5/
user/123
user/123//
user/123/5/
user/12345678901/
Edit: Sorry about the formatting; I do not understand how to format anything via this markdown. Those examples are preceded by 4 spaces which I thought should make a code block, but obviously I thought wrong.
^user/(?:([0-9]{1,10})/)?$ should work just fine.
This: ^user(?=/)(/\d{1,10})?/$ Edit: if you want to group digits, ^user(?=/)(?:/(\d{1,10}))?/$

Validating an IP with regex

I need to validate an IP range that is in format 000000000 to 255255255 without any delimiters between the 3 groups of numbers.
Each of the three groups that the final IP consists of should be 000 (yes, 0 padded) to 255.
As this is my 1st stackoverflow entry, please be lenient if I did not follow etiquette correctly.
^([01]\d{2}|2[0-4]\d|25[0-5]){3}$
Which breaks down in the following parts:
000-199
200-249
250-255
If you decide you want 4 octets instead of 3, just change the last {3} to {4}. Also, you should be aware of IPv6 too.
I would personally not use regex for this. I think it's easier to ensure that the string consists of 9 digits, split up the string into 3 groups of 3-digit numbers, and then check that each number is between 0 and 255, inclusive.
If you really insist on regex, then you could use something like this:
"([0-1][0-9][0-9]|2[0-4][0-9]|25[0-5]){3}"
The expression comprises an alternation of three terms: the first matches 000-199, the second 200-249, the third 250-255. The {3} requires the match exactly three times.
This is a pretty common question. Here is a nice intro page on regexps, that has this case as an example. It includes the periods, but you can edit those out easily enough.
for match exclusively a valid IP adress use
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}
instead of
([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])(([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])){3}
because many regex engine match the first possibility in the OR sequence
you can try your regex engine with : 10.48.0.200
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
I use this RegEx for search all ip in code from my project