Regex pattern for domain-part of URL - regex

I am looking for a regex-pattern that matches the domain path of an url (http or https)
example 1:
https://www.blabla.com/path/pic.jpg
should match
https://www.blabla.com
example 2:
http://my.domain.tld/directory/?something
should match
http://my.domain.tld

something along the lines:
#^(https?://[a-z0-9.-]+)(?=/|$).*#i
It depends of course which characters you'd like to allow in the domain name.
P.S. # are there to delimit the regex, i at the end indicates case-insensitivity.

Related

Regex for Affiliate URL

For Matomo outgoing link tracking I need the regex pattern, which matched the following URLs:
https://www.example.com/product/?sku=12345&utm_source=123456789
and
https://www.example.com/product/?utm_source=123456789
"https://www.example.com/" and "utm_source=123456789" are always fixed in the URL, just "product/" or "category/product/" change and must replaced by regex pattern.
Thanks
Maybe this example can help you reach your goal:
(?<=https:\/\/www\.example\.com\/).+(?=utm_source=123456789)
It looks for any characters between these two groups:
https://www.example.com/
utm_source=123456789
Given the examples:
https://www.example.com/product/?sku=12345&utm_source=123456789
https://www.example.com/product/?utm_source=123456789
Your matches would be:
product/?sku=12345&
product/?

URL regex for sublinks in the main website

I want to create a regex for the URLs that I want to whitelist in WAF.
For the sub URL, example:
http://www.example.com/wp-admin/admin-ajax.php?action=aks_expt
and
http://www.example.com/wp-admin/admin-ajax.php?action=aks_expt&aks_ex1m_expt=1
I guess, I managed to put regex for the starting portion:
(https:\/\/www\.|http:\/\/www\.^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$)
to match up to http://www.example.com or https://www.example.com
but I am unsure how to be for the sublinks:
/wp-admin/admin-ajax.php?action=aks_expt
and
/wp-admin/admin-ajax.php?action=aks_expt&aks_ex1m_expt=1
I built one anyway but fails:
https:\/\/www\.|http:\/\/www\.^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$\/wp-admin\/admin-ajax\.php\?action\=aks\_expt
Try the following regex:
(?:https?://www\.example\.com)?/wp-admin/admin-ajax\.php\?action=aks_expt(?:&aks_ex1m_expt=1)?
Demo
Explanation:
(?:https?:\/\/www\.example\.com)? match optional leading http/https
followed by domain
/wp-admin/admin-ajax\.php\?action=aks_expt mandatory path
(?:&aks_ex1m_expt=1)? optional query parameter

Regex to match all except URLs that contain specific directory?

I need a regular expression for IIS URL Rewrite that will process the rule only when the expression matches any bit of the URL EXCEPT a specific sub-root directory.
Example:
www.mysite.com/wordpress - process rule on any URL that starts with /wordpress after the domain name
www.mysite.com/inventory - do not process rule on any URL that starts with /inventory after the domain name
Tried .*(?<!^\/inventory\/.*) but it still matches the entire string.
You need a lookahead rather than lookbehind. Something like this I think:
^([^/]*/){1}(?!inventory\b)
Where you change 1 to 2 when the exclusion is needed at the next lower sublevel, etc.

Regular expression to match only domain from URL

I'm struggling with forming a regex that would match:
Just domain in case of URL
Whole string in case of no URL
Acceptance test (regex should match bold text):
http://mozart.co.uk
https://avocado.si/hmm
http://www.qwe123qwe.com
Starbucks
Benchmark 123
So far I've come up with this:
([^\/\/]+)(?:,|$)
It works fine, but not for URLs with trailing slash on the end. How can I modify the expression to include full path (everything on the right side of http(s)://) as well? Thank you.
This regex will match them if it starts with http:// or https:// until the next slash. If it doesn't start with http:// nor https:// then it will match the whole string. Close enough?
(?:^https?:\/\/([^\/]+)(?:[\/,]|$)|^(.*)$)
I should note that most languages have functions built in to properly parse URLs and these are preferable.
You should note that I've got 2 sets of capturing parentheses, so depending on your language that may be significant.
Maybe that ^(http[s]?:\/\/)?(.*)$. Play here: https://regex101.com/r/iZ2vL4/1
This will have Matching groups, the domain you want will be in the 4th matching group.
/^((http[s]?|ftp):\/\/)?\/?([^\/\.]+\.)*?([^\/\.]+\.[^:\/\s\.]{1,3}(\.[^:\/\s\.]{1,2})?(:\d+)?)($|\/)([^#?\s]+)?(.*?)?(#[\w\-]+)?$/mg
Regex101.com workbench to check out your URLs just paste them in the "TEST STRING" Textbox to test it out.
Don't recall where I got this... so I don't know who to credit. But it's pretty slick!

Regex to match any domain except two domains

in my htaccess i'm trying to set document root for all park domains to a specific path except two main domains, so basically i need a regex to match any domain except tow domains
i found something like this
^(?!foo$|bar$).*
and this
(?>[\w-]+)(?<!tea|nuka-cola)
but can not get it work with my situation because there is a dot tld in domain name and i want to use regex there too
here is my current regex
^(.*?)\.(com|net)$
instead of (.*?) i want to make exception there
Use a negative look behind:
^(.*?)(?<!(foo)|(bar))\.(com|net)$
Not sure what you want, but this regex will not match urls ending in foo.com or bar.net etc