Regex domain validation - regex

The following code
/^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}(:[0-9]{1,5})?(\/.*‌​)?$/ix
validates all types of domains.
I would like to validate only one domain or subdomain (for example .cu.cc or .co.cc).

You can just add this to the end of your domain regex:
(?<=\.cu\.cc)$
It's a positive look-behind

The final \.[a-z]{2,6} is what matches a top-level domain. Change it to whatever specific TLD you want to match.

Related

Regex remove www from URL

I hope someone can help, this is driving me crazy!
I am attempting to modify Logstash Grok filters to parse a domain name.
Currently the regex is:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) and correctly separates the domain however, I need to add an additional check to remove www..
This is what I have come up with so far:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(^(?<!www$).*$?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
I can only seem to keep the www. part of the domain, and not the domain itself.
Example of what I need to achieve:
www.stackoverflow.com should be stackoverflow.com.
I need to remove specifically www. and not the entire subdomain.
Thank you in advance!
UPDATE
Example inputs to expected outputs (using this post as an example):
In it's current state:
https://stackoverflow.com/questions/37070358/ returns www.stackoverflow.com
What I need is for it to return stackoverflow.com
You can add a (?!www\.) and (?!http:\/\/www\.) negative lookaheads right after the first \b to exclude matching www. or http://www.:
\b(?!www\.)(?!http:\/\/www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
You may add more negative lookaheads to exclude https:// or ftp/ftps links.
ALTERNATIVE:
\b(?!(?:https?|ftps?):\/\/)(?!www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
See this regex demo
The (?!(?:https?|ftps?):\/\/) and (?!www\.) lookaheads will just let you skip the protocol and www parts of the URLs.
This will match the part after www if the url starts with www.
(?!www\.)\b(?:(?!-)[0-9A-Za-z]{1,63})(?:\.(?:(?!-)[0-9A-Za-z-]{1,63}))*(\.?|\b)
I simplified the rest of your regex too by using a negative look ahead for - in the subdomains.

Regex needed to match a domain name for django view

I'm trying to match a url with a domain like:
Testing.com
testing.com
Testing.net
testing.net
Testing.org
testing.org
and other extensions as well.
I'm trying to formulate a regex to use in a django view like:
(r'^Account/Testing/d=([a-z]{1,50})$', TestApp),
I tried ^[A-za-z]{2,50}$ but that doesn't match a domain with capital letter in the beginning
Any help?
Thank you!
you can use this
/^(?:http(?:s)?:\/\/)?(?:w{3})\.([a-z_0-9-]+\.\w{2,3}(?:\.\w{2})?)/i
it will match for links likes this
http://www.site.com
https://www.site.com
http://www.site.co.uk
https://www.site.co.uk
http://www.site.com.br
https://www.site.com.br
http://www.site-site.com.br
https://www.site-site.com
http://www.site-site.co.uk
https://www.site-site.co.uk
www.site-site.com
www.site-site.co.uk
www.site-site.com.br
www.site.com
and alot of other variations
even if the site has
www.site.com/news
it will only match for "site.com"
the /i modifier will match for all variations of upper and lower cases
if you only want to match domain name as upper and lower
/^(?:http(?:s)?:\/\/)?(?:w{3})\.((?i:[a-z_0-9-])+\.\w{2,3}(?:\.\w{2})?)/
(?i:[a-z_0-9-]) will match variations for domain's names only
Fortunately, this wasn't that bad after all - this is one way to match a domain with varying extensions:
^[A-za-z]{2,50}.[a-z]{1,3}$
matches .com, .org, .net, etc.
If you have a domain like me2.com, its better to use this:
(^[A-za-z0-9]{2,50}.[a-z]{1,3})$

Regex to validate a url using a wildcard?

Lets say I have a list of valid domain roots,
example.com
test.com
And a variable
String url
How would I make use of a regex to validate that my variable url is on the list, including subdomains?
For example, perhaps my url is "subdomain.case.example.com"
That is, to say clearly:
How would I utilize a regex to verify that my url is *.example.com OR *.test.com OR example.com OR test.com?
Something like this?
^((\*|[\w\d]+(-[\w\d]+)*)\.)*(example|test)(\.com)$
Edit live on Debuggex
To allow for such things as... subdomain.*.example.com, subdomain.example.com, example.com, *.example.com, etc.
Use $ to mark the end of string.
Your regex would be
.*(example|test)[.]com$

Regex to match any domain except two domains

in my htaccess i'm trying to set document root for all park domains to a specific path except two main domains, so basically i need a regex to match any domain except tow domains
i found something like this
^(?!foo$|bar$).*
and this
(?>[\w-]+)(?<!tea|nuka-cola)
but can not get it work with my situation because there is a dot tld in domain name and i want to use regex there too
here is my current regex
^(.*?)\.(com|net)$
instead of (.*?) i want to make exception there
Use a negative look behind:
^(.*?)(?<!(foo)|(bar))\.(com|net)$
Not sure what you want, but this regex will not match urls ending in foo.com or bar.net etc

Regex pattern for domain-part of URL

I am looking for a regex-pattern that matches the domain path of an url (http or https)
example 1:
https://www.blabla.com/path/pic.jpg
should match
https://www.blabla.com
example 2:
http://my.domain.tld/directory/?something
should match
http://my.domain.tld
something along the lines:
#^(https?://[a-z0-9.-]+)(?=/|$).*#i
It depends of course which characters you'd like to allow in the domain name.
P.S. # are there to delimit the regex, i at the end indicates case-insensitivity.