URL regex for sublinks in the main website

URL regex for sublinks in the main website - regex

I want to create a regex for the URLs that I want to whitelist in WAF.
For the sub URL, example:
http://www.example.com/wp-admin/admin-ajax.php?action=aks_expt
and
http://www.example.com/wp-admin/admin-ajax.php?action=aks_expt&aks_ex1m_expt=1
I guess, I managed to put regex for the starting portion:
(https:\/\/www\.|http:\/\/www\.^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$)
to match up to http://www.example.com or https://www.example.com
but I am unsure how to be for the sublinks:
/wp-admin/admin-ajax.php?action=aks_expt
and
/wp-admin/admin-ajax.php?action=aks_expt&aks_ex1m_expt=1
I built one anyway but fails:
https:\/\/www\.|http:\/\/www\.^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$\/wp-admin\/admin-ajax\.php\?action\=aks\_expt

Try the following regex:
(?:https?://www\.example\.com)?/wp-admin/admin-ajax\.php\?action=aks_expt(?:&aks_ex1m_expt=1)?
Demo
Explanation:
(?:https?:\/\/www\.example\.com)? match optional leading http/https
followed by domain
/wp-admin/admin-ajax\.php\?action=aks_expt mandatory path
(?:&aks_ex1m_expt=1)? optional query parameter

Related

Regex remove www from URL

I hope someone can help, this is driving me crazy!
I am attempting to modify Logstash Grok filters to parse a domain name.
Currently the regex is:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) and correctly separates the domain however, I need to add an additional check to remove www..
This is what I have come up with so far:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(^(?<!www$).*$?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
I can only seem to keep the www. part of the domain, and not the domain itself.
Example of what I need to achieve:
www.stackoverflow.com should be stackoverflow.com.
I need to remove specifically www. and not the entire subdomain.
Thank you in advance!
UPDATE
Example inputs to expected outputs (using this post as an example):
In it's current state:
https://stackoverflow.com/questions/37070358/ returns www.stackoverflow.com
What I need is for it to return stackoverflow.com

You can add a (?!www\.) and (?!http:\/\/www\.) negative lookaheads right after the first \b to exclude matching www. or http://www.:
\b(?!www\.)(?!http:\/\/www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
You may add more negative lookaheads to exclude https:// or ftp/ftps links.
ALTERNATIVE:
\b(?!(?:https?|ftps?):\/\/)(?!www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
See this regex demo
The (?!(?:https?|ftps?):\/\/) and (?!www\.) lookaheads will just let you skip the protocol and www parts of the URLs.

This will match the part after www if the url starts with www.
(?!www\.)\b(?:(?!-)[0-9A-Za-z]{1,63})(?:\.(?:(?!-)[0-9A-Za-z-]{1,63}))*(\.?|\b)
I simplified the rest of your regex too by using a negative look ahead for - in the subdomains.

Regex match domain name without www and keep website.com still intact

I have this regex
^(?:http(?:s)?://)?(?:www(?:[0-9]+)?\.)
to strip off the www and http(s):// part of any domain name and give just the domain name. It works with:
example.com
http://example.com
http://www.example.com
But when used with a domain name starting with letter w it strips the w off
website.com => ebsite.com
Any ideas on how to make it better? Please test it with this data set http://regexr.com/3abl2
Thanks

I think you want something like this:
^(?:https?:\/\/)?(?:www\.)?(.*)$
Please see this Regex Demo for examples and explanation.
UPDATE It looks like you also want to omit www0, www1, etc.? Then you'll want this:
^(?:https?:\/\/)?(?:www[0-9]*\.)?(.*)$
Please see updated demo here.

Drop the part (?:[0-9]+)?.) from the regex
Add optional quantifier ? to www. Matches zero or one www
The regex can be written as
^(?:http(?:s)?:\/\/)?(?:www)?
Regex Demo

Regex to match any domain except two domains

in my htaccess i'm trying to set document root for all park domains to a specific path except two main domains, so basically i need a regex to match any domain except tow domains
i found something like this
^(?!foo$|bar$).*
and this
(?>[\w-]+)(?<!tea|nuka-cola)
but can not get it work with my situation because there is a dot tld in domain name and i want to use regex there too
here is my current regex
^(.*?)\.(com|net)$
instead of (.*?) i want to make exception there

Use a negative look behind:
^(.*?)(?<!(foo)|(bar))\.(com|net)$
Not sure what you want, but this regex will not match urls ending in foo.com or bar.net etc

Regex pattern for domain-part of URL

I am looking for a regex-pattern that matches the domain path of an url (http or https)
example 1:
https://www.blabla.com/path/pic.jpg
should match
https://www.blabla.com
example 2:
http://my.domain.tld/directory/?something
should match
http://my.domain.tld

something along the lines:
#^(https?://[a-z0-9.-]+)(?=/|$).*#i
It depends of course which characters you'd like to allow in the domain name.
P.S. # are there to delimit the regex, i at the end indicates case-insensitivity.

regex for validate URL without http/https

All,
I am new to REGEX world...
I know that there are lot of regex avail for validating the common URL with http in it.
But I am looking for a regex to validate the URL in the following formats(without HTTP/HTTPS):
www.example.com/user/login
www.example.com
www.exmaple.co.xx
www.example.com/user?id=234&name=fname
in case if the URL contains only,
www.example(without the domain - .com OR .co.xx)
example.com (without "www")
I should throw an error to the user.
any help would be highly appreciated...
Thanks
Raj

This regex will pass your first set, but not match the second set:
^www\.example\.(com|co.xx)(/.*)?$
In English, this regex requires:
starts with www.example.
followed by either com or co.xx
optionally followed by / then anything
You could be more prescriptive about what can follow the optional slash by replacing (/.*) with (/(user|buy|sell)\?.*) etc

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

URL regex for sublinks in the main website - regex

Related

Regex remove www from URL

Regex match domain name without www and keep website.com still intact

Regex to match any domain except two domains

Regex pattern for domain-part of URL

regex for validate URL without http/https

Categories

Resources