I need a regex to be able to validate a domain name without http:// or https://
What I mean:
Valid Should Be:
domain.com
domain.fr
domain.it
domain.whateverelse
subdomain.domain.com
subdomain.doamin.fr
subdomain.domain.whateverelse
Invalid Should Be:
domain
http://domain.com
https://domain.com
https://domain.whateverelse
http://subdomain.domain.com
http://subdomain.domain.fr
http://subdomain.domain.whateverelse
This is what I came so far:
(http(s)?://)?([\w-]+\.)+[\w-]+[.com]+(/[/?%&=]*)?
For example the regex above considers domain.whatever as invalid ... so basically works just with .com
This one look realy good:
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
Is comming from: http://regexlib.com/REDetails.aspx?regexp_id=96
Where can find a lot more ;)
Related
I hope someone can help, this is driving me crazy!
I am attempting to modify Logstash Grok filters to parse a domain name.
Currently the regex is:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) and correctly separates the domain however, I need to add an additional check to remove www..
This is what I have come up with so far:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(^(?<!www$).*$?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
I can only seem to keep the www. part of the domain, and not the domain itself.
Example of what I need to achieve:
www.stackoverflow.com should be stackoverflow.com.
I need to remove specifically www. and not the entire subdomain.
Thank you in advance!
UPDATE
Example inputs to expected outputs (using this post as an example):
In it's current state:
https://stackoverflow.com/questions/37070358/ returns www.stackoverflow.com
What I need is for it to return stackoverflow.com
You can add a (?!www\.) and (?!http:\/\/www\.) negative lookaheads right after the first \b to exclude matching www. or http://www.:
\b(?!www\.)(?!http:\/\/www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
You may add more negative lookaheads to exclude https:// or ftp/ftps links.
ALTERNATIVE:
\b(?!(?:https?|ftps?):\/\/)(?!www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
See this regex demo
The (?!(?:https?|ftps?):\/\/) and (?!www\.) lookaheads will just let you skip the protocol and www parts of the URLs.
This will match the part after www if the url starts with www.
(?!www\.)\b(?:(?!-)[0-9A-Za-z]{1,63})(?:\.(?:(?!-)[0-9A-Za-z-]{1,63}))*(\.?|\b)
I simplified the rest of your regex too by using a negative look ahead for - in the subdomains.
I have this regex
^(?:http(?:s)?://)?(?:www(?:[0-9]+)?\.)
to strip off the www and http(s):// part of any domain name and give just the domain name. It works with:
example.com
http://example.com
http://www.example.com
But when used with a domain name starting with letter w it strips the w off
website.com => ebsite.com
Any ideas on how to make it better? Please test it with this data set http://regexr.com/3abl2
Thanks
I think you want something like this:
^(?:https?:\/\/)?(?:www\.)?(.*)$
Please see this Regex Demo for examples and explanation.
UPDATE It looks like you also want to omit www0, www1, etc.? Then you'll want this:
^(?:https?:\/\/)?(?:www[0-9]*\.)?(.*)$
Please see updated demo here.
Drop the part (?:[0-9]+)?.) from the regex
Add optional quantifier ? to www. Matches zero or one www
The regex can be written as
^(?:http(?:s)?:\/\/)?(?:www)?
Regex Demo
I want to check domains with regex. My old code was:
/^([a-z0-9]+([-a-z0-9]*[a-z0-9]+)?.){0,}([a-z0-9]+([-a-z0-9]*[a-z0-9]+)?){1,63}(.[a-z0-9]{2,7})+$/i
It is okey but this code doesn't validate IDNs (internationalized domain names) such as öü.com or öü.öü
My domain format is:
example.com
Besides, I don't want:
www.example.com
http://example.com
http://www.example.com
Important note: user can add the domains;
with 2 extension like example.co.uk
You can add support for IDNs by replacing a-z by \pL
Idn's such as भारत.icom.museum use Punycode encoding, as defined in RFC 3492, before submission for DNS resolution.
It seems that you're using php, based on that, you should use the idn_to_ascii() function to convert the idn's, ex:
echo idn_to_ascii("भारत.icom.museum");
//xn--h2brj9c.icom.museum
Lets say I have a list of valid domain roots,
example.com
test.com
And a variable
String url
How would I make use of a regex to validate that my variable url is on the list, including subdomains?
For example, perhaps my url is "subdomain.case.example.com"
That is, to say clearly:
How would I utilize a regex to verify that my url is *.example.com OR *.test.com OR example.com OR test.com?
Something like this?
^((\*|[\w\d]+(-[\w\d]+)*)\.)*(example|test)(\.com)$
Edit live on Debuggex
To allow for such things as... subdomain.*.example.com, subdomain.example.com, example.com, *.example.com, etc.
Use $ to mark the end of string.
Your regex would be
.*(example|test)[.]com$
All,
I am new to REGEX world...
I know that there are lot of regex avail for validating the common URL with http in it.
But I am looking for a regex to validate the URL in the following formats(without HTTP/HTTPS):
www.example.com/user/login
www.example.com
www.exmaple.co.xx
www.example.com/user?id=234&name=fname
in case if the URL contains only,
www.example(without the domain - .com OR .co.xx)
example.com (without "www")
I should throw an error to the user.
any help would be highly appreciated...
Thanks
Raj
This regex will pass your first set, but not match the second set:
^www\.example\.(com|co.xx)(/.*)?$
In English, this regex requires:
starts with www.example.
followed by either com or co.xx
optionally followed by / then anything
You could be more prescriptive about what can follow the optional slash by replacing (/.*) with (/(user|buy|sell)\?.*) etc