RegEx - First Two Octet Match - regex

I'm trying to learn RegEx using ImmersiveLabs/LinkedInLearning and other web-based resources and things are going well.
There's a small question to which I'm not sure how to even Google for an answer.
Scenario, Azure ATP Query wherein I wanted to match Private Addressing Scheme
| where From_IP matches regex #'(^127\.)|(^10\.)|(^172\.1[6-9]\.)|(^172\.2[0-9]\.)|(^172\.3[0-1]\.)|(^192\.168\.)'
It works well! Matches what I want it to. The question is - why?!
For e.g. (~172.2[0-9].) shouldn't this only match on the first two octets of the string 172.20.1.9 ? Why is then the entire IP matched successfully?
Seems weird for me to question something that is working. Any tips are appreciated.

There is no $ in your regex so your regex does not asserts position at the end of a line, so it basically doesn't care what comes after 172.20. , see for more info: regex101.com/r/TgjdVz/1
In addition to match all private IPv4 subnets use to following regex.
^(10(\.(25[0-5]|2[0-4][0-9]|1[0-9]{1,2}|[0-9]{1,2})){3}|((172\.(1[6-9]|2[0-9]|3[01]))|192\.168)(\.(25[0-5]|2[0-4][0-9]|1[0-9]{1,2}|[0-9]{1,2})){2})$

Related

Regex to split a address string

So I'm a real rooky with REGEX and I usually get my way through it back reference a static word in the string and then using just basic functions to find what I need, this one has me stuck though
So I have this address string "MITCHAM SA 5062" and to go through this parser i need to split the suburb, state and postcode.
I can get "MITCHAM" using /\w+/
And postcode "5062" using /\d+/
The state I'm struggling with though. I think I'm close, I'm currently using (?!\w+) (\w+) Issue here is it is still picking up the whitespace before the suburb which won't be allowed in the database.
Halp pls!
Edit - Few questions about if the state will ever be more than two letters - correct it could be. It won't always be SA
Edit 2 - Another person asked if one while regex can capture it all - No, the way our SaaS product works, I need to map each bit of data to the correct place separately (using a GUI)
If MITCHAM SA 5062 is the full string, and you want to capture each group in one regex than this will work:
^(\w+)\s*?(\w+)\s*(\d+)
If you are trying to capture the middle section only you can try:
\s(\w+)\s
Or if for some reason you cannot use capturing groups, this will work for the middle portion.
(?<=\s)(\w+)(?=\s+)

Matching multiple iterations of a word with regex

I am trying to craft some regex that can match all of the following strings (no more, no less):
ftp
ftps
sftp
ftpes
The closest I have gotten is: ^ftps?$ but as you can tell that only matches the top two. How can I match all of the ones listed? I am aware that I can use something like ^(ftp|ftps|sftp|ftpes)$ but I wanted to save as much space as possible (my actual application is much much larger than this so space saving is more necessary. I used the ftp example for visual appeal).
I am using this in a bash 3.2.57(1)-release if-statement.
I suggest this extended regex:
^(ftpe?s|s?ftp)$
ftpe?s: one e is optional
s?ftp: one s is optional
See: The Stack Overflow Regular Expressions FAQ
A variation on an extended regex would be:
^(s?ftp|^ftp(s|es)?)$
Here:
(^s?ftp| zero or one 's' followed by ftp at the beginning
or
ftp followed by (s|es)?$ zero or one 's' or 'es' after.
Let me know if you have further questions.

Find last occurrence of period with regex

I'm trying to create a regex for validating URLs. I know there are many advanced ones out there, but I want to create my own for learning purposes.
So far I have a regex that works quite well, however I want to improve the validation for the TLD part of the URI because I feel it's not quite there yet.
Here's my regex (or find it on regexr):
/^[(http(s)?):\/\/(www\.)?a-zA-Z0-9#:._\+~#=]{2,256}\.[a-zA-Z]{2,6}\b([/#?]{0,1}([A-Za-z0-9-._~:?#[\]#!$&''()*+,;=]|(%[A-Fa-f0-9]{2}))*)$/
It works well for links such as foo.com or http://foo.com or foo.co.uk
The problem appears when you introduce subdomains or second-level domains such as co.uk because the regex will accept foo.co.u or foo.co..
I did try using the following to select the substring after the last .:
/[(http(s)?):\/\/(www\.)?a-zA-Z0-9#:._\+~#=]{2,256}[^.]{2,}$/
but this prevents me from defining the path rules of the URI.
How can I ensure that the substring after the last . but before the first /, ? or # is at least 2 characters long?
From what I can see, you're almost there. Made some modification and it seems to work.
^(http(s)?:\/\/)?(www\.)?[a-zA-Z0-9#:._\+~#=]{2,256}\.[a-zA-Z]{2,6}([/#?;]([A-Za-z0-9-._~:?#[\]#!$&''()*+,;=]|(%[A-Fa-f0-9]{2}))*)?$
Can be somewhat shortened by doing
^(http(s)?:\/\/)?(www\.)?[\w#:.\+~#=]{2,256}\.[a-zA-Z]{2,6}([/#?;]([-\w.~:?#[\]#!$&''()*+,;=]|(%[A-Fa-f0-9]{2}))*)?$
(basically just tweaked your regex)
The main difference is that the parameter part is optional, but if it is there it has to start with one of /#?;. That part could probably be simplified as well.
Check it out here.
Edit:
After some experimenting I think this one is about as simple it'll get:
^(http(?:s)?:\/\/)?([-.~\w]+\.[a-zA-Z]{2,6})(:\d+)?(\/[-.~\w]*)?([#/#?;].*)?$
It also captures the separate parts - scheme, host, port, path and query/params.
Example here.

regex to find domain without those instances being part of subdomain.domain

I'm new to regex. I need to find instances of example.com in an .SQL file in Notepad++ without those instances being part of subdomain.example.com(edited)
From this answer, I've tried using ^((?!subdomain))\.example\.com$, but this does not work.
I tested this in Notepad++ and # https://regex101.com/r/kS1nQ4/1 but it doesn't work.
Help appreciated.
Simple
^example\.com$
with g,m,i switches will work for you.
https://regex101.com/r/sJ5fE9/1
If the matching should be done somewhere in the middle of the string you can use negative look behind to check that there is no dot before:
(?<!\.)example\.com
https://regex101.com/r/sJ5fE9/2
Without access to example text, it's a bit hard to guess what you really need, but the regular expression
(^|\s)example\.com\>
will find example.com where it is preceded by nothing or by whitespace, and followed by a word boundary. (You could still get a false match on example.com.pk because the period is a word boundary. Provide better examples in your question if you want better answers.)
If you specifically want to use a lookaround, the neative lookahead you used (as the name implies) specifies what the regex should not match at this point. So (?!subdomain\.)example trivially matches always, because example is not subdomain. -- the negative lookahead can't not be true.
You might be better served by a lookbehind:
(?<!subdomain\.)example\.com
Demo: https://regex101.com/r/kS1nQ4/3
Here's a solution that takes into account the protocols/prefixes,
/^(www\.)?(http:\/\/www\.)?(https:\/\/www\.)?example\.com$/

Regex to see if ip starts with 156.21.x.x

I'm writing a regex for google analytics and I need to block any IP from 156.21.x.x I don't care about the last 2 octets just the first two. I would like to keep the regex to as few characters as possible as google only allows 255 chars and my regex is already pretty large.
not sure what flavor of regex or what lang your using, but this will work on most regex engines:
156\.21\.\d{1,3}\.\d{1,3}
Of course, this will match invalid ip's like 156.21.777.888, but if the list your parsing doesnt contain invalid ip addresses, then you should be ok. Or:
156\.21(\.\d{1,3}){2}
If you are running short on space, this would work, though you would match non-IP addresses as well. If you can assume Google will give you valid IP addresses, this is your shortest option:
^156\.21\.
Matches things like: 156.21.1.1 156.21.1000.1000 156.21.ABC
But does not match http://156.21.1.1 ehlo 156.21.1000.1000
The following regex would match (almost) valid IPv4 addresses that starts with 156.21:
(156\.21(?:\.[\d]{1,3}){2})