regex optional string in the middle followed by negative lookahead - regex

I have following entries of 3 allowed ip in a config file:
logging host 10.1.1.1
logging host ipv4 10.1.1.2
logging host 10.1.1.3
ipv4 is an optional string.I need to make sure that there are no entries with unallowed ip.For eg: if there is a line:
logging host 10.1.1.4
then the file is invalid because 10.1.1.4 is not one of the three allowed ip.I have come up with a Java regex to check for existense of any unallowed ip:
^logging host (ipv4\s)?(?!10.1.1.1|10.1.1.2|10.1.1.3)
It only works when the optional string ipv4 is not present and not when the optional string is present as in the case of second entry: "logging host ipv4 10.1.1.2".The regex engine in the first attempt greedily matches upto "logging host ipv4" and the remaining string 10.1.1.2 exists as one of the options in the negative lookahead condition.Then the regex engine makes a second attempt to non greedily match only upto "logging host" as ipv4 is optional and then remaining string becomes "ipv4 10.1.1.2" which does not exist in the negative lookahead condition and so returns this whole line as unallowed ip which is not true.
What am I missing??

You get a partial match because you are not matching anything after the lookahead.
For example, in logging host 10.1.1.1 the lookahead sees the value that is not allowed after matching host and there are no other options to explore so the match fails.
In logging host ipv4 10.1.1.2 the ipv4 part will be matched. Then the lookahead will see the match that is not allowed. This time it can backtrack as the ipv4 part is optional. So it can get a match from the position before ipv4, and the match is logging host
You could shorten the pattern for the specific ip numbers to 10\.1\.1\.[123]
For example
^logging host (ipv4\s)?(?!10\.1\.1\.[123])\d{1,3}(?:\.\d{1,3}){3}$
Regex demo

Thanks very much to 'The fourth bird' for leading me to the answer with his important hints.
To summarize I need to ensure that the config file should not contain any unallowed logging host entries.The following are allowed host entries in the config file:
logging host 10.1.1.1
logging host 10.1.1.2
logging hsot ipv6 EFD7:DEA8:AEE4::11:3
The tricky bit here is using an optional for ipv6 did not solve the problem due to backtracking at the optional:
^logging host (ipv6\s)?(?!10.1.1.1|10.1.1.2|ipv6 EFD7:DEA8:AEE4::11:3)
The first solution uses Atomic Grouping to stop backtracking and the second solution is much simpler.
^logging (?>host ipv6|host)?\s(?!10.1.1.1|10.1.1.2|EFD7:DEA8:AEE4::11:3)
^logging host\s(?!10.1.1.1|10.1.1.2|ipv6 EFD7:DEA8:AEE4::11:3)

Related

Regex for finding domains in a sentence but not IP addresses

I am trying to write a regular expression that will match domains in a sentence.
I found this post which was very useful and helped me create the following to match domains, but it also unfortunately matches IP addresses too which I do not want:
((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\._-]{1,61}|[a-z0-9-]{1,30})
I want to update my expression so that the following can still be found: in a sentence, between brackets, etc.:
www.example.com
subdomain.example.com
subdomain.example.co.uk
But not:
192.168.0.0
127.0.0.1
Is there a way to do this?
We could use a simple lookahead that excludes combinations of numbers and dots only: (?![\d.]+)
(?![\d.]+)((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\._-]{1,61}|[a-z0-9-]{1,30})
Demo
Answer from #wp78de is correct, however it would not detect the domains starting with Numerical digits i.e. 123reg.com
So remove the first group in the regex like this
((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\._-]{1,61}|[a-z0-9-]{1,30})

How to make a regular expression for IPv6 with prefix?

I created this very complex regular expression(RegEx101) for IPv4 and IPv6
((^\s*((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))\s*$)|(^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$))|(^\*$)
Below are three examples of data that can be checked by this regular expression.
2001:db8:abcd:0012:0000:0000:0000:0000 (ipv6)
0000:0000:2001:DB8:ABCD:12:: (condensed notation)
255.255.255.0 (ipv4)
but this regular expression does not work for IPv6 addresses with prefix.
For example:
2001:db8:abcd:0012::0/112
does not work.
How can this problem be fixed?
And if anybody in the future wants optional subnet masks for ipv4 as well as ipv6.
/((^\s*((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))\s*(\/(\d|1\d|2\d|3[0-2]))?$)|(^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3}(\/(\d{1,2}|1[0-1]\d|12[0-8]))?)|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$))|(^\*$)/g
https://regex101.com/r/lS4Cjo/1
My understanding is that you want to optionally match a 'prefix'(a number at the end of the address which is always preceded by a forward slash) p such that 1≤p≤128. Let's try breaking this up.
Optionally match the following block:
Match a forward slash /
Either match a two digit number
Or match a number between 100 and 119 (inclusive)
Or match a number a number between 120 and 128 (inclusive)
The above is equivalent to this regex: (\/(\d{1,2}|1[0-1]\d|12[0-8]))?.
https://regex101.com/r/5cBm5a/3

IIS Web.config Regex forward slash restraint

I'm using web.config to rewrite URLs in IIS 8.5
This is my regex:
match url="^((?:[a-z]{2}\/{1}){1,2})?listen$"
This will successfully match the following:
en/gb/listen
en/listen
listen
However the part that I can't get to work is restraining the forward slashes in each optional group to a single character:
\/{1}
Interestingly this example does work on https://regex101.com/r/VNwejt/1
Any help would be appreciated.
You may restrict the whole pattern using a negative lookahead at the start:
^(?!.*//)<PATTERN_GOES_HERE>
See the regex demo.
The (?!.*//) lookahead fails the match if there is a // substring anywhere on a line of text.
However, in this case, the lookahead is redundant as your consuming pattern does not allow 2 consecutive // anywhere in the string, ^(?!.*//)((?:[a-z]{2}/){1,2})?listen$. Check the other options in your configuration file.

Regex: How can I match third IPv4 address?

I'm a regex noob and for the life of me I can't figure out how to match the third IPv4 address on line that contains three IPv4 addresses.
The line in question:
ip route 214.25.48.547 255.255.255.255 16.48.75.46 name Chicago-VPN
The regex I have so far that matches one IP:
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
If I put a {3} at the end of it, it breaks. I think it has something to do with the spaces between the addresses but I can't figure out how to handle that. I need to capture the third address.
https://regex101.com/r/mN3cR6/1
You just need to add a multiline modifier to the code.
Your new code should be like this
/([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3})/g
See this demo https://regex101.com/r/mN3cR6/2
Try
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s?)+
This should match one, two, or three, or even more "IPs".
Or
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
for exactly 3.
Or
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s?){3}
for a shorter formula with some possible errors.
Note that the basic idea is problematic too, as it matches "999.999.999.999" when it is definitely not a valid IP address.
The following should match the third ip
(?:[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s){2}([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
It's possible to be more compact depending what language you're using - for instance in ruby
string.scan(/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/)[2]
would give you what you want. You could also collapse the multiple [0-9]{1,3}. instances using non matching groups and counts
The problem is, that the regex needs to not only contain the IPs but also the spaces between the IPs.
So adding a space into the repeated group should do the trick:
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} ){3}
If you don't want tat space in the final match, you make it non-greedy, using ?? (or *?):
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} ??){3}
Also note, that your regex matches more than just valid IPs. e.g. 999.999.999.999 would match nicely.
You are already matching all three IPs with that regex.
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
Match 1
214.25.48.547
Match 2
255.255.255.255
Match 3
16.48.75.46
You can test it here:
http://rubular.com/
The problem may be with how you are trying to access them.
In Ruby, your regex works perfectly:
regex = /([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/
"ip route 214.25.48.547 255.255.255.255 16.48.75.46 name Chicago-VPN".scan(regex)
=> [["214.25.48.547"], ["255.255.255.255"], ["16.48.75.46"]]

How does this Squid regex filter rule work?

On our Squid server, the admin has put on a new regex rule:
^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+
I know that it stands for IP address, but it allows all URLs to go through, only pinging external address has stopped. Also tunneling software like UltraSurf have stopped connecting to the server. Skype also is not getting connected.
Please explain how this works! Thanks.
I am not sure about your particular issue with the Squid server, but here is what the regex does:
[0-9]+ means "any digit one or more times", so it is matching a string that begins with a digit one or more times, followed by a dot, followed by a digit one or more times, followed by a dot, followed by a digit one or more times, followed by dot, followed by a digit one or more times.. then anything else. In essence, it is matching any IP address, so it wouldn't filter anything out. It will also match things that are not even valid IP addresses like 123456.123456.123456.123456 or 1.1.1.1 or 125.252.252.252asdf.
Paolo has explained the meaning of the Regex well! As mentioned, the Regex currently being used is too weak (or should I say too restrictive!)
If you want a much better Regex to match IP addresses, see this page.