Help decoding a regular expression for use with Google Analytics - regex

Update
I missed this in my original explanation. I set this up yesterday, and it ran over night. No data populated in my profile overnight. So, either my regex is wrong, or Google cannot see internal traffic IPs.
It seems that everyone has their own variation on the syntax for regular expressions.
I'm trying to include only internal traffic on one of my profiles in Google Analytics
Can someone verify for me what they expect that regular expression to match? In CIDER notation?

I don't know what CIDER notation is, but that regex matches a string that
starts with 10.
followed by 90. or 60.
followed by 10 or 9
followed by zero or more dots.
You probably want ^10\.[96]0\.(10|9)\..*$
Since the last bit (.*) is a bit too vague (unless you know that there will only ever be valid IP addresses in the live data), you might want to change that to \d+ or (if you want to restrict to a valid range from 0 to 255) 25[0-5]|2[0-4]\d|1?\d?\d

Don't know about CIDR notation, but that will match any of 10.90.9.*, 10.90.10.*, 10.60.9.* or 10.60.10.*

instead of the last asterisk, place \d+. The way you wrote it, in the end you've got to have 0 or more dots for the expression to validate.

Related

RegEx - First Two Octet Match

I'm trying to learn RegEx using ImmersiveLabs/LinkedInLearning and other web-based resources and things are going well.
There's a small question to which I'm not sure how to even Google for an answer.
Scenario, Azure ATP Query wherein I wanted to match Private Addressing Scheme
| where From_IP matches regex #'(^127\.)|(^10\.)|(^172\.1[6-9]\.)|(^172\.2[0-9]\.)|(^172\.3[0-1]\.)|(^192\.168\.)'
It works well! Matches what I want it to. The question is - why?!
For e.g. (~172.2[0-9].) shouldn't this only match on the first two octets of the string 172.20.1.9 ? Why is then the entire IP matched successfully?
Seems weird for me to question something that is working. Any tips are appreciated.
There is no $ in your regex so your regex does not asserts position at the end of a line, so it basically doesn't care what comes after 172.20. , see for more info: regex101.com/r/TgjdVz/1
In addition to match all private IPv4 subnets use to following regex.
^(10(\.(25[0-5]|2[0-4][0-9]|1[0-9]{1,2}|[0-9]{1,2})){3}|((172\.(1[6-9]|2[0-9]|3[01]))|192\.168)(\.(25[0-5]|2[0-4][0-9]|1[0-9]{1,2}|[0-9]{1,2})){2})$

regex to find domain without those instances being part of subdomain.domain

I'm new to regex. I need to find instances of example.com in an .SQL file in Notepad++ without those instances being part of subdomain.example.com(edited)
From this answer, I've tried using ^((?!subdomain))\.example\.com$, but this does not work.
I tested this in Notepad++ and # https://regex101.com/r/kS1nQ4/1 but it doesn't work.
Help appreciated.
Simple
^example\.com$
with g,m,i switches will work for you.
https://regex101.com/r/sJ5fE9/1
If the matching should be done somewhere in the middle of the string you can use negative look behind to check that there is no dot before:
(?<!\.)example\.com
https://regex101.com/r/sJ5fE9/2
Without access to example text, it's a bit hard to guess what you really need, but the regular expression
(^|\s)example\.com\>
will find example.com where it is preceded by nothing or by whitespace, and followed by a word boundary. (You could still get a false match on example.com.pk because the period is a word boundary. Provide better examples in your question if you want better answers.)
If you specifically want to use a lookaround, the neative lookahead you used (as the name implies) specifies what the regex should not match at this point. So (?!subdomain\.)example trivially matches always, because example is not subdomain. -- the negative lookahead can't not be true.
You might be better served by a lookbehind:
(?<!subdomain\.)example\.com
Demo: https://regex101.com/r/kS1nQ4/3
Here's a solution that takes into account the protocols/prefixes,
/^(www\.)?(http:\/\/www\.)?(https:\/\/www\.)?example\.com$/

Google Analytics IP Filter Exclude

Could someone help me with some REGEX...
I have been blocking internal traffic using the filter pattnrn:
10.*..
This just bit me in the foot as this is blocking all referral traffic between our sites.
What I want to do now is block everything except 10.103..
Do I need to apply two separate ranges, or can I accomplish this with one filter?
If you want to block everything but 10.103.xxx.xxx, use an include filter instead of the usual exclude filter.
NOTE ABOUT REGEXES MATCHING IPs IN ANALYTICS
I am not sure if the filter I suggested above uses regex or not (literal string match), but it doesn't make a difference because there's no way the expression 10.103. could be misinterpreted in an IP address.
Your original pattern, on the other hand, is bogus and is probably hurting you. That's because in a regex the dot . is not a literal dot, but represents any character. Your expression, in fact, excludes every single IP that merely starts with 10 (not just 10. that is ten-dot), including 100.xxx, 101.xxx etc.
The correct version of your original excluding regex would be 10\..*, which contains an escaped dot (\.), then proceeds to any characters after that (.*).
REGEXP are very good explained in the Google Analytics Help (here).
For multiple IPs, there is this little helper, which generates the REGEXP for you.
If you want to block internal traffic, just ADD NEW FILTER and CUSTOM then EXCLUDE and put the IP in REGEXP in the field, that's it.

Regular expression to exclude local addresses

I'm trying to configure my Foxy Proxy program and one of the features is to provide a regular expression for an exclusion list.
I'm trying to blacklist the local sites (ending in .local), but it doesn't seem to work.
This is what I attempted:
^(?:https?://)?\d+\.(?!local)+/.*$
^(?:https?://)?\d+\.(?!local)(\d)+/.*$
I also researched on Google and Stack Exchange with no success.
Since you indicate in the comments that you actually need a whitelist solution, I went with that:
Try: ^(?:https?://)?[\w.-]+\\.(?!local)\w+/.*$
http://regex101.com/r/xV4gS0
Your regex expressions match host names which start with a series of digits followed by a period and then not followed by the string "local". If this is a "blacklist", then that hardly seems like what you want.
If you're trying to match all hostnames which end in .local, you'd want something like the following for the hostname portion:
[^/]*\.local(?:/|$)
with appropriate escapes inserted depending on regex context.
If your original question was incorrect and you really need a whitelist, then you'd want something like:
^(?:(?!\.local)[^\/])*(?:\/|$)
as illustrated in http://regex101.com/r/yB0uY4
Thank you everyone to help. Indeed, it turns out that for this program, enlisting "not .local" as blacklist, it's not the same as "all .local" as whitelist.
I also had a rookie mistake on my pattern. I meant "\w" instead of "\d". Thank you Peter Alfvin for catching that.
So my final working solution is what Bart suggested:
^(?:https?://)?[\w.-]+\.(?!local)\w+/.*$ as a whitelist.

RegEx to get numbers between periods in IP address?

Say you have an IP address: 74.125.45.100 so its A.B.C.D
Is there a way to use RegEx to get A,B,C separately?
If it is just to extract the numbers from the IP and not to validate the IP address then you could just do:
[0-9]
However, I think a simple String.Split(".") would be an easier option.
Something very simple yet ugly would work.. giving you four groups one for each octet.
(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})
([0-9]+).([0-9]+).([0-9]+).([0-9]+)
...should do it. It's no validating regex though, allows numbers beyond 255 for each part.
Here's a crazy validating one:
\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
Credit to last regex goes to RegexBuddy makers.
/(\d+)\.(\d+)\.(\d+)\.(\d+)/
First port of call for regex... RegEx Library
While others have pointed out various good regexps; May I ask why you absolutely must use regular expressions for that? It will be slow and error-prone. Most platforms do have integrated IP address functionality, or provide a way to call to inet_aton.
In case someone needs a validating RegEx for (all possible) IPv4 addresses:
([^\d.]|^)([01]{0,1}\d{1,2}|2[0-5][0-5])[.]([01]{0,1}\d{1,2}|2[0-5][0-5])[.]([01]{0,1}\d{1,2}|2[0-5][0-5])[.]([01]{0,1}\d{1,2}|2[0-5][0-5])([^\d]|$)
The IP is contained in 2nd, 3rd and 4th parameters. 1st and last are not used. Those are necessary otherwise a wrong IP like:
999.1.2.3
would be catched as "99.1.2.3". I am not sure if you want to allow IP ending with a dot, e.g.
1.2.3.4.
If not, change the last part to ([^\d.]|$). I do not allow any dots in front of it though.
I still think this RegEx is a messed monster :) and a better solution would be to validate by hand using a function.