Regex to validate a url using a wildcard? - regex

Lets say I have a list of valid domain roots,
example.com
test.com
And a variable
String url
How would I make use of a regex to validate that my variable url is on the list, including subdomains?
For example, perhaps my url is "subdomain.case.example.com"
That is, to say clearly:
How would I utilize a regex to verify that my url is *.example.com OR *.test.com OR example.com OR test.com?

Something like this?
^((\*|[\w\d]+(-[\w\d]+)*)\.)*(example|test)(\.com)$
Edit live on Debuggex
To allow for such things as... subdomain.*.example.com, subdomain.example.com, example.com, *.example.com, etc.

Use $ to mark the end of string.
Your regex would be
.*(example|test)[.]com$

Related

Regex for this URL, http://www.chip.de and this domain chip.de

I am trying to create a regex to look for similar URL and domain like this below
*chip.de
http://www.chip.de*
I tried to use the regex expression
http?:\/\/([\w\.-]+)([\/\w \.-]*)
It did not capture the URL.
I tried to use the url, https://www.regextester.com/99497 to test it out and it failed..
What am I missing?
Please create two rules for domain and URL
Thank you
If you're simply looking for regex that will match URLs which include chip.de then please try this and let me know if it is sufficient:
https?\:\/\/www\.chip\.de.*

Regex remove www from URL

I hope someone can help, this is driving me crazy!
I am attempting to modify Logstash Grok filters to parse a domain name.
Currently the regex is:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) and correctly separates the domain however, I need to add an additional check to remove www..
This is what I have come up with so far:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(^(?<!www$).*$?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
I can only seem to keep the www. part of the domain, and not the domain itself.
Example of what I need to achieve:
www.stackoverflow.com should be stackoverflow.com.
I need to remove specifically www. and not the entire subdomain.
Thank you in advance!
UPDATE
Example inputs to expected outputs (using this post as an example):
In it's current state:
https://stackoverflow.com/questions/37070358/ returns www.stackoverflow.com
What I need is for it to return stackoverflow.com
You can add a (?!www\.) and (?!http:\/\/www\.) negative lookaheads right after the first \b to exclude matching www. or http://www.:
\b(?!www\.)(?!http:\/\/www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
You may add more negative lookaheads to exclude https:// or ftp/ftps links.
ALTERNATIVE:
\b(?!(?:https?|ftps?):\/\/)(?!www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
See this regex demo
The (?!(?:https?|ftps?):\/\/) and (?!www\.) lookaheads will just let you skip the protocol and www parts of the URLs.
This will match the part after www if the url starts with www.
(?!www\.)\b(?:(?!-)[0-9A-Za-z]{1,63})(?:\.(?:(?!-)[0-9A-Za-z-]{1,63}))*(\.?|\b)
I simplified the rest of your regex too by using a negative look ahead for - in the subdomains.

Regex match domain name without www and keep website.com still intact

I have this regex
^(?:http(?:s)?://)?(?:www(?:[0-9]+)?\.)
to strip off the www and http(s):// part of any domain name and give just the domain name. It works with:
example.com
http://example.com
http://www.example.com
But when used with a domain name starting with letter w it strips the w off
website.com => ebsite.com
Any ideas on how to make it better? Please test it with this data set http://regexr.com/3abl2
Thanks
I think you want something like this:
^(?:https?:\/\/)?(?:www\.)?(.*)$
Please see this Regex Demo for examples and explanation.
UPDATE It looks like you also want to omit www0, www1, etc.? Then you'll want this:
^(?:https?:\/\/)?(?:www[0-9]*\.)?(.*)$
Please see updated demo here.
Drop the part (?:[0-9]+)?.) from the regex
Add optional quantifier ? to www. Matches zero or one www
The regex can be written as
^(?:http(?:s)?:\/\/)?(?:www)?
Regex Demo

Validation code for IDN (Domain) with regex (regular expression)

I want to check domains with regex. My old code was:
/^([a-z0-9]+([-a-z0-9]*[a-z0-9]+)?.){0,}([a-z0-9]+([-a-z0-9]*[a-z0-9]+)?){1,63}(.[a-z0-9]{2,7})+$/i
It is okey but this code doesn't validate IDNs (internationalized domain names) such as öü.com or öü.öü
My domain format is:
example.com
Besides, I don't want:
www.example.com
http://example.com
http://www.example.com
Important note: user can add the domains;
with 2 extension like example.co.uk
You can add support for IDNs by replacing a-z by \pL
Idn's such as भारत.icom.museum use Punycode encoding, as defined in RFC 3492, before submission for DNS resolution.
It seems that you're using php, based on that, you should use the idn_to_ascii() function to convert the idn's, ex:
echo idn_to_ascii("भारत.icom.museum");
//xn--h2brj9c.icom.museum

Regex domain validation

The following code
/^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}(:[0-9]{1,5})?(\/.*‌​)?$/ix
validates all types of domains.
I would like to validate only one domain or subdomain (for example .cu.cc or .co.cc).
You can just add this to the end of your domain regex:
(?<=\.cu\.cc)$
It's a positive look-behind
The final \.[a-z]{2,6} is what matches a top-level domain. Change it to whatever specific TLD you want to match.