I'm writing a regex for google analytics and I need to block any IP from 156.21.x.x I don't care about the last 2 octets just the first two. I would like to keep the regex to as few characters as possible as google only allows 255 chars and my regex is already pretty large.
not sure what flavor of regex or what lang your using, but this will work on most regex engines:
156\.21\.\d{1,3}\.\d{1,3}
Of course, this will match invalid ip's like 156.21.777.888, but if the list your parsing doesnt contain invalid ip addresses, then you should be ok. Or:
156\.21(\.\d{1,3}){2}
If you are running short on space, this would work, though you would match non-IP addresses as well. If you can assume Google will give you valid IP addresses, this is your shortest option:
^156\.21\.
Matches things like: 156.21.1.1 156.21.1000.1000 156.21.ABC
But does not match http://156.21.1.1 ehlo 156.21.1000.1000
The following regex would match (almost) valid IPv4 addresses that starts with 156.21:
(156\.21(?:\.[\d]{1,3}){2})
Related
I am using Trellix DLP solution and have IP Address classification to block outgoing IP Address information.
My regex is \b(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})\b
However, this also block documents which have 4 level numbered lists, like:
1.blah
1.1 blah blah
1.1.1 blah blah blah
1.1.1.1 blah blah blah blah (DLP thinks this is an IP Address and block the document)
is there any way to bypass this.
Regexes sometimes feel like magic, but unfortunatelly they are not. A regex cannot distinguish between an ip address versus a numbered footnote or article.
You can try to add some sort of intelligence (to say) to the regex, but you'll always end up having false positives/negatives. This sort of intelligence comes from inspecting previous or next characters.
If you try to go this way, start to use a regular expression that matches just valid ip addresses (your regex can match 300.1.2.3, which is not valid)
Also determine what ip address are you trying to avoid. Because if you are trying to avoid just private ip addresses, then you have less chances to get a false positive if you craft a regex that matches only private ip addresses.
If you try to get whatever ip address, then try to avoid matches that have 4 or more spaces before the match (or less than 4 and a begin of line). This is to try to avoid numbered titles.
(?<!^\h)(?<!^\h\h)(?<!^\h\h\h)(?<!\h\h\h\h)\b(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})\b
note: Use m modifier. If you cannot specify flags, try to use the regex like this:
(?m)(?<!^\h)(?<!^\h\h)(?<!^\h\h\h)(?<!\h\h\h\h)\b(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})\b
NOTE: if your tool does not support \h, change them for [\t\p{Zs}] or [ \t]
You have a very basic demo here. Please, keep on reading before using that for production :-)
Of course, since negative lookbehind usually cannot be variable length (unless some specific programming languages/tools), the more cases you add to the negative lookbehind with extra spaces, the more probable to skip those articles and not getting a false negative.
Also the tool must support negative lookbehinds, of course.
You could even combine both cases: a regex that matches 172.x.x.x and 192.x.x.x private addresses (not including 10.x.x.x private addresses because they are pretty low), in which case it may not take into account extra constraints, or any other valid ip address with extra constraints (like the spaces)
Are there any more false positives that you detected? Try to stablish similar rules for them. For example, consider that you could match footnotes like these: <<See 1.2.3.4>> or *1.2.3.4. Try to add exceptions for ip-address-like strings that start by * or end with >>, for example.
To sum up: "You cannot", but if you insist or try to...
Add extra 'logic' to the regex according to your found false positives
Check if the tool lacks needed regex features (like positive/negative lookbehinds)
The logic may be very specific to the document that you specified on your example. If there are other documents with other different formats, it may not be possible to have a generic solution for any kind of document
Even if you just have a single type of document to inspect, you may still have false positives/negatives, in which case, go to step 1 and repeat
I'm trying to learn RegEx using ImmersiveLabs/LinkedInLearning and other web-based resources and things are going well.
There's a small question to which I'm not sure how to even Google for an answer.
Scenario, Azure ATP Query wherein I wanted to match Private Addressing Scheme
| where From_IP matches regex #'(^127\.)|(^10\.)|(^172\.1[6-9]\.)|(^172\.2[0-9]\.)|(^172\.3[0-1]\.)|(^192\.168\.)'
It works well! Matches what I want it to. The question is - why?!
For e.g. (~172.2[0-9].) shouldn't this only match on the first two octets of the string 172.20.1.9 ? Why is then the entire IP matched successfully?
Seems weird for me to question something that is working. Any tips are appreciated.
There is no $ in your regex so your regex does not asserts position at the end of a line, so it basically doesn't care what comes after 172.20. , see for more info: regex101.com/r/TgjdVz/1
In addition to match all private IPv4 subnets use to following regex.
^(10(\.(25[0-5]|2[0-4][0-9]|1[0-9]{1,2}|[0-9]{1,2})){3}|((172\.(1[6-9]|2[0-9]|3[01]))|192\.168)(\.(25[0-5]|2[0-4][0-9]|1[0-9]{1,2}|[0-9]{1,2})){2})$
The RegEx below only highlight specific Private IP addressing scheme:
(?!^0\.)(?!^10\.)(?!^100\.6[4-9]\.)(?!^100\.[7-9]\d\.)(?!^100\.1[0-1]\d\.)(?!^100\.12[0-7]\.)(?!^127\.)(?!^169\.254\.)(?!^172\.1[6-9]\.)(?!^172\.2[0-9]\.)(?!^172\.3[0-1]\.)(?!^192\.0\.0\.)(?!^192\.0\.2\.)(?!^192\.88\.99\.)(?!^192\.168\.)(?!^198\.1[8-9]\.)(?!^198\.51\.100\.)(?!^203.0\.113\.)(?!^22[4-9]\.)(?!^23[0-9]\.)(?!^24[0-9]\.)(?!^25[0-5]\.)(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))
Like in this example https://regex101.com/r/tKKYx0/3 I need to update the code to only match the Public IP addresses list on the top.
A regex you can try is:
^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?=,)
Test here.
Note: I did not really understand how the first IPs are different from the rest of the IPs. My regex looks for IPs at the beginning of a line, immediately followed by a comma.
Note2: My regex does not really validate IPs. E.g. 568.914.348.759 will be successfully returned.
For the new sample, try:
^(|(\S+.*?))(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
Test here.
Update
I missed this in my original explanation. I set this up yesterday, and it ran over night. No data populated in my profile overnight. So, either my regex is wrong, or Google cannot see internal traffic IPs.
It seems that everyone has their own variation on the syntax for regular expressions.
I'm trying to include only internal traffic on one of my profiles in Google Analytics
Can someone verify for me what they expect that regular expression to match? In CIDER notation?
I don't know what CIDER notation is, but that regex matches a string that
starts with 10.
followed by 90. or 60.
followed by 10 or 9
followed by zero or more dots.
You probably want ^10\.[96]0\.(10|9)\..*$
Since the last bit (.*) is a bit too vague (unless you know that there will only ever be valid IP addresses in the live data), you might want to change that to \d+ or (if you want to restrict to a valid range from 0 to 255) 25[0-5]|2[0-4]\d|1?\d?\d
Don't know about CIDR notation, but that will match any of 10.90.9.*, 10.90.10.*, 10.60.9.* or 10.60.10.*
instead of the last asterisk, place \d+. The way you wrote it, in the end you've got to have 0 or more dots for the expression to validate.
Say you have an IP address: 74.125.45.100 so its A.B.C.D
Is there a way to use RegEx to get A,B,C separately?
If it is just to extract the numbers from the IP and not to validate the IP address then you could just do:
[0-9]
However, I think a simple String.Split(".") would be an easier option.
Something very simple yet ugly would work.. giving you four groups one for each octet.
(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})
([0-9]+).([0-9]+).([0-9]+).([0-9]+)
...should do it. It's no validating regex though, allows numbers beyond 255 for each part.
Here's a crazy validating one:
\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
Credit to last regex goes to RegexBuddy makers.
/(\d+)\.(\d+)\.(\d+)\.(\d+)/
First port of call for regex... RegEx Library
While others have pointed out various good regexps; May I ask why you absolutely must use regular expressions for that? It will be slow and error-prone. Most platforms do have integrated IP address functionality, or provide a way to call to inet_aton.
In case someone needs a validating RegEx for (all possible) IPv4 addresses:
([^\d.]|^)([01]{0,1}\d{1,2}|2[0-5][0-5])[.]([01]{0,1}\d{1,2}|2[0-5][0-5])[.]([01]{0,1}\d{1,2}|2[0-5][0-5])[.]([01]{0,1}\d{1,2}|2[0-5][0-5])([^\d]|$)
The IP is contained in 2nd, 3rd and 4th parameters. 1st and last are not used. Those are necessary otherwise a wrong IP like:
999.1.2.3
would be catched as "99.1.2.3". I am not sure if you want to allow IP ending with a dot, e.g.
1.2.3.4.
If not, change the last part to ([^\d.]|$). I do not allow any dots in front of it though.
I still think this RegEx is a messed monster :) and a better solution would be to validate by hand using a function.