I am using the below regex of validating the website URL.
^(http(s?):\/\/)?(www\.)+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$
It work fine with the below website URL to match:
www.google.com
http://www.google.com/
https://www.google.com/
It also not math below URL
google.com
google.co
www.g#oogle.com
But it will fails to test the below URL:
www...google.com
http://www...google.com/
https://www...google.com/
Please give the suggestion for the same.
I have already go through the below stack overflow URL but answer is not useful for me.
Regular expression for checking website url
What is a good regular expression to match a URL?
To avoid the ... you can use a negative lookahead
For example :
^(?!.*\.\.)(https?:\/\/)?www\.[\w.\-]+(\.[a-zA-Z]{2,3})+(\/[\w.?%#&=\/\-]*)?$
The (?!.*\.\.) in that regex won't allow 2 dots in the string.
Related
I have very small website where the links are few pages, I want to write few regex so it can match and if match they should redirect to that page, I had already installed urlrewrite in IIS 8
the url i have is:
website.com/page1.cfm
or http://www.website.com/page1.cfm or http://website.com/page1.cfm
to this:
http://website.com/page1
removing the extension
The following regex will match all 3 of your uri formats:
^((http:\/\/|)(www\.|)website\.com\/.+\.cfm)$
To see it in action, and see explanations, go here: https://regexr.com/3ja85
Note: Replace 'website' with your domain.
This regex will match any sub page of your domain (website.com/[anything].cfm)
EDIT:
This regex will return the uri of the 3 formats without the extension while enforcing its presence at the end of a uri:
^((http:\/\/|)(www\.|)website\.com\/.+)(?=\.cfm)
I hope someone can help, this is driving me crazy!
I am attempting to modify Logstash Grok filters to parse a domain name.
Currently the regex is:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) and correctly separates the domain however, I need to add an additional check to remove www..
This is what I have come up with so far:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(^(?<!www$).*$?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
I can only seem to keep the www. part of the domain, and not the domain itself.
Example of what I need to achieve:
www.stackoverflow.com should be stackoverflow.com.
I need to remove specifically www. and not the entire subdomain.
Thank you in advance!
UPDATE
Example inputs to expected outputs (using this post as an example):
In it's current state:
https://stackoverflow.com/questions/37070358/ returns www.stackoverflow.com
What I need is for it to return stackoverflow.com
You can add a (?!www\.) and (?!http:\/\/www\.) negative lookaheads right after the first \b to exclude matching www. or http://www.:
\b(?!www\.)(?!http:\/\/www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
You may add more negative lookaheads to exclude https:// or ftp/ftps links.
ALTERNATIVE:
\b(?!(?:https?|ftps?):\/\/)(?!www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
See this regex demo
The (?!(?:https?|ftps?):\/\/) and (?!www\.) lookaheads will just let you skip the protocol and www parts of the URLs.
This will match the part after www if the url starts with www.
(?!www\.)\b(?:(?!-)[0-9A-Za-z]{1,63})(?:\.(?:(?!-)[0-9A-Za-z-]{1,63}))*(\.?|\b)
I simplified the rest of your regex too by using a negative look ahead for - in the subdomains.
I'm trying to get a regular expression to work where the following URLs are accepted:
www.somesite.com
somesite.com
www.somesite.ca
somesite.ca
somesite.cu.sk.ca
www.somsite.cu.sk.ca
somesite.sk.ca
www.somesite.sk.ca
I have the following so far but it allows www.somesite
^(www\.)?[a-zA-Z0-9_\-]+\.([a-zA-Z]{2,4}|[a-zA-Z]{2}.[a-zA-Z]{2})(.[a-zA-z]{2})?$
Query strings, http, https, ftp are not in play here. Thanks!
You forgot to escape . in the last pattern (.[a-zA-z]{2}) (the dot will match any character):
^(www\.)?[\w-]+\.([a-zA-Z]{2,4}|[a-zA-Z]{2}.[a-zA-Z]{2})(\.[a-zA-z]{2})?$
↑
See DEMO
Also, I replaced your [a-zA-Z0-9_\-] with its equivalent [\w-]
All,
I am new to REGEX world...
I know that there are lot of regex avail for validating the common URL with http in it.
But I am looking for a regex to validate the URL in the following formats(without HTTP/HTTPS):
www.example.com/user/login
www.example.com
www.exmaple.co.xx
www.example.com/user?id=234&name=fname
in case if the URL contains only,
www.example(without the domain - .com OR .co.xx)
example.com (without "www")
I should throw an error to the user.
any help would be highly appreciated...
Thanks
Raj
This regex will pass your first set, but not match the second set:
^www\.example\.(com|co.xx)(/.*)?$
In English, this regex requires:
starts with www.example.
followed by either com or co.xx
optionally followed by / then anything
You could be more prescriptive about what can follow the optional slash by replacing (/.*) with (/(user|buy|sell)\?.*) etc
I have the following regex that attempts to match URLs:
/((http|https):(([A-Za-z0-9$_.+!*(),;/?:#&~=-])|%[A-Fa-f0-9]{2}){2,}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*(),;/?:#&~=%-]*))?([A-Za-z0-9$_+!*();/?:~-]))/g
How can I modify this regex to only match URLs of a single domain?
For example, I only want to match URLs that begin with http://www.google.com?
This should simplify my regex, but I'm too much of a regex noob to get it working (after all these years...)
Did you write that RegEx? I don't know what it's trying to do, but it certainly doesn't match URLs correctly. Here's something it matches:
http:###9#?~
which I'm pretty sure isn't a valid URL.
You shouldn't be using RegEx to match URLs like this. You haven't said what language you're working in, but use whatever its equivalent of urlparse is..
Here's a relevant question: How do you validate a URL with a regular expression in Python?