URL regular expression [duplicate]

URL regular expression [duplicate] - regex

This question already has answers here:
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 9 years ago.
i am writing a regular expression to check a website URL it should check the following scenarios:
pass:
- www.example.com
- example.com
- www.example.com/something
- example.com/something
and prevent every other urls
its working perfectly for every thing except one case (www.example), how can i handel this case
"www.example" must not pass
my regular expression :
^[a-zA-Z0-9][a-zA-Z0-9]+([.][a-zA-Z0-9]+)+(/.*)?$
can any one help please ?
Thanx.

Heres the best i could get
(www.){1}[a-zA-Z0-9]+[.]{1}[\w]+[/\w]*
Result
www.example.com - true
www.example.com/ - true
www.example.com/xyx - true
www.example.com/xy/s/ - true
www.example. - False
www.example - False
please note that this wont accept 'example.com'
Tested # http://gskinner.com/RegExr/

Try this one:
^(www\.)?(?!www)[a-zA-Z0-9]+\.[a-zA-Z]{2,6}/?[a-zA-Z0-9]+$

This is the actual URL validating regex used in Django 1.5.1:
import re
regex = re.compile(
r'^(?:http|ftp)s?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
r'localhost|' # localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4
r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)$', re.IGNORECASE)
This does both ipv4 and ipv6 addresses as well as GET parameters.
Found in the code here, Line 44.

Try this:
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
I can't claim credit though; I yanked it from here:
http://mathiasbynens.be/demo/url-regex
They've got a reasonable chart with lots of expressions with pass/fail for each case against each expression.

Not the best regex but works in many cases:
^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}(/.*)*$
Edit:
^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+(com|org|info|biz|us)/?([^/]*)$
To allow trailing slash:
^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+(com|org|info|biz|us)/?([^/]*)/?$

Related

Regex for website

I am using the below regex of validating the website URL.
^(http(s?):\/\/)?(www\.)+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$
It work fine with the below website URL to match:
www.google.com
http://www.google.com/
https://www.google.com/
It also not math below URL
google.com
google.co
www.g#oogle.com
But it will fails to test the below URL:
www...google.com
http://www...google.com/
https://www...google.com/
Please give the suggestion for the same.
I have already go through the below stack overflow URL but answer is not useful for me.
Regular expression for checking website url
What is a good regular expression to match a URL?

To avoid the ... you can use a negative lookahead
For example :
^(?!.*\.\.)(https?:\/\/)?www\.[\w.\-]+(\.[a-zA-Z]{2,3})+(\/[\w.?%#&=\/\-]*)?$
The (?!.*\.\.) in that regex won't allow 2 dots in the string.

Using a wildcard in Regex at the end of a URL in GA

I'm a newbie at Regex. I'm trying to get a report in GA that returns all pages after a certain point in the URL.
For example:
http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/14-June-2016/
I want to see all dates so: http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/*
Here's what I've got so far in my regex:
^https:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox(?=(?:\/.*)?$)

You can try this:
https?:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox[\w/_-]*

GA RE2 regex engine does not allow lookarounds (even lookaheads) in the pattern. You have defined one - (?=(?:\/.*)?$).
If you need all links having www.essentialibiza.com/ibiza-club-tickets/carl-cox/, you can use a simple regex:
www\.essentialibiza\.com/ibiza-club-tickets/carl-cox/
If you want to precise the protocol:
https?://www\.essentialibiza\.com/ibiza-club-tickets/carl-cox(/|$)
The ? will make s optional (1 or 0 occurrences) and (/|$) will allow matching the URL ending with cox (remove this group if you want to match URLs that only have / after cox).

Regex remove www from URL

I hope someone can help, this is driving me crazy!
I am attempting to modify Logstash Grok filters to parse a domain name.
Currently the regex is:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) and correctly separates the domain however, I need to add an additional check to remove www..
This is what I have come up with so far:
\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(^(?<!www$).*$?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
I can only seem to keep the www. part of the domain, and not the domain itself.
Example of what I need to achieve:
www.stackoverflow.com should be stackoverflow.com.
I need to remove specifically www. and not the entire subdomain.
Thank you in advance!
UPDATE
Example inputs to expected outputs (using this post as an example):
In it's current state:
https://stackoverflow.com/questions/37070358/ returns www.stackoverflow.com
What I need is for it to return stackoverflow.com

You can add a (?!www\.) and (?!http:\/\/www\.) negative lookaheads right after the first \b to exclude matching www. or http://www.:
\b(?!www\.)(?!http:\/\/www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
You may add more negative lookaheads to exclude https:// or ftp/ftps links.
ALTERNATIVE:
\b(?!(?:https?|ftps?):\/\/)(?!www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
See this regex demo
The (?!(?:https?|ftps?):\/\/) and (?!www\.) lookaheads will just let you skip the protocol and www parts of the URLs.

This will match the part after www if the url starts with www.
(?!www\.)\b(?:(?!-)[0-9A-Za-z]{1,63})(?:\.(?:(?!-)[0-9A-Za-z-]{1,63}))*(\.?|\b)
I simplified the rest of your regex too by using a negative look ahead for - in the subdomains.

Need a simple reg ex for url checking [duplicate]

This question already has answers here:
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 9 years ago.
I am looking for it about 2 hours, but can not find what I need.
what I need is very simple:
allow: google.com, http://google.com, https://google.com
disallow spaces "goo gle.com"
with a valid domain: I mean it should have a dot "." + any domain (.com, .net etc.)
and allow anything after that: "googl.com/dsfsdf/sdfs/blablahblah/" without spaces
thanks
Edit:
Thanks all, I had to write it myself.
if (!/^((ftp|http|https):\/\/)?([a-z0-9_\.-]+)\.{1}([a-z0-9_\/\?\=\-\%-]+)$/.test(uri)
|| /([\._\/\?\=\-\%-])\1/.test(uri)) {
}
ps: I am noob in regexs.

www.google.com
http://www.google.com
mailto:somebody#google.com
somebody#google.com
www.url-with-querystring.com/?url=has-querystring
The REGEX below matches all the above cases
((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)
REGEX Explanation can be found here
Working Example

Something that's working for me on a production product (haven't received any complaints yet):
((www\.|(http|https|ftp|news|file)+\:\/\/)?[_.a-z0-9-]+\.[a-z0-9\/_:#=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])

regex for validate URL without http/https

All,
I am new to REGEX world...
I know that there are lot of regex avail for validating the common URL with http in it.
But I am looking for a regex to validate the URL in the following formats(without HTTP/HTTPS):
www.example.com/user/login
www.example.com
www.exmaple.co.xx
www.example.com/user?id=234&name=fname
in case if the URL contains only,
www.example(without the domain - .com OR .co.xx)
example.com (without "www")
I should throw an error to the user.
any help would be highly appreciated...
Thanks
Raj

This regex will pass your first set, but not match the second set:
^www\.example\.(com|co.xx)(/.*)?$
In English, this regex requires:
starts with www.example.
followed by either com or co.xx
optionally followed by / then anything
You could be more prescriptive about what can follow the optional slash by replacing (/.*) with (/(user|buy|sell)\?.*) etc

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

URL regular expression [duplicate] - regex

Try this one: ^(www\.)?(?!www)[a-zA-Z0-9]+\.[a-zA-Z]{2,6}/?[a-zA-Z0-9]+$

Related

Regex for website

Using a wildcard in Regex at the end of a URL in GA

Regex remove www from URL

Need a simple reg ex for url checking [duplicate]

regex for validate URL without http/https

Categories

Resources