I currently use this regular expression to validate URLs:
^([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?([a-z0-9-.]*)\.([a-z]{2,4})(\:0*(?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9]))?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?$
It matches a fairly long list of URLs:
google.com
google.com#a1
google.com?abc=123
google.com:80
google.com:80#a1
google.com:80?abc=123
google.com:80/test
google.com:80/test#a1
google.com:80/test?abc=123
google.com:80/test?abc=123#a1
www.google.com
www.google.com#a1
www.google.com?abc=123
www.google.com:80
www.google.com:80#a1
www.google.com:80?abc=123
www.google.com:80/test
www.google.com:80/test#a1
www.google.com:80/test?abc=123
www.google.com:80/test?abc=123#a1
www.www.google.com
www.www.google.com#a1
www.www.google.com?abc=123
www.www.google.com:80
www.www.google.com:80#a1
www.www.google.com:80?abc=123
www.www.google.com:80/test
www.www.google.com:80/test#a1
www.www.google.com:80/test?abc=123
www.www.google.com:80/test?abc=123#a1
john:smith#google.com
john:smith#google.com#a1
john:smith#google.com?abc=123
john:smith#google.com:80
john:smith#google.com:80#a1
john:smith#google.com:80?abc=123
john:smith#google.com:80/test
john:smith#google.com:80/test#a1
john:smith#google.com:80/test?abc=123
john:smith#google.com:80/test?abc=123#a1
john:smith#www.google.com
john:smith#www.google.com#a1
john:smith#www.google.com?abc=123
john:smith#www.google.com:80
john:smith#www.google.com:80#a1
john:smith#www.google.com:80?abc=123
john:smith#www.google.com:80/test
john:smith#www.google.com:80/test#a1
john:smith#www.google.com:80/test?abc=123
john:smith#www.google.com:80/test?abc=123#a1
john:smith#www.www.google.com
john:smith#www.www.google.com#a1
john:smith#www.www.google.com?abc=123
john:smith#www.www.google.com:80
john:smith#www.www.google.com:80#a1
john:smith#www.www.google.com:80?abc=123
john:smith#www.www.google.com:80/test
john:smith#www.www.google.com:80/test#a1
john:smith#www.www.google.com:80/test?abc=123
john:smith#www.www.google.com:80/test?abc=123#a1
However, it does not match these URLs which, to my knowledge, are also valid:
8.8.8.8
8.8.8.8#a1
8.8.8.8?abc=123
8.8.8.8:80
8.8.8.8:80#a1
8.8.8.8:80?abc=123
8.8.8.8:80/test
8.8.8.8:80/test#a1
8.8.8.8:80/test?abc=123
8.8.8.8:80/test?abc=123#a1
john:smith#8.8.8.8
john:smith#8.8.8.8#a1
john:smith#8.8.8.8?abc=123
john:smith#8.8.8.8:80
john:smith#8.8.8.8:80#a1
john:smith#8.8.8.8:80?abc=123
john:smith#8.8.8.8:80/test
john:smith#8.8.8.8:80/test#a1
john:smith#8.8.8.8:80/test?abc=123
john:smith#8.8.8.8:80/test?abc=123#a1
For reference, I found this one for IP addresses which seems to work well:
^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
How can I tie them together? Or, is there a better regex to match all the URLs here?
Demo:
http://rubular.com/r/ufuNkHqX5G
You can combine two regular expressions together by doing (?:<regex1>|<regex2>), which means whatever matches regex1 or regex2. (The ?: means the added parentheses won't capture).
You can find a variety of regexes for URL validation online, e.g. In search of the perfect URL validation regex lists quite a few.
Validating an email address is a bit more complicated than validating a webpage URL. In fact,
determining a proper regex for validating an email address seems to be a question without one definitive right answer; see Using a regular expression to validate an email address
If you use PHP, you aren't limited to using a regex to validate email addresses and URLs, as the following code illustrates:
<?php
$url = "http://8.8.8.8";
$mess = (!filter_var($url, FILTER_VALIDATE_URL))? "invalid" : "valid";
echo $mess, ": $url\n";
$email = "me#he re.com";
$mess = (!filter_var($email, FILTER_VALIDATE_EMAIL))? "invalid" :"valid";
echo $mess, ": $email\n";
Related
I want to match the username before # in address mail,
and i create this regex
[A-Za-z+ /w+0-9._%+-]+#
the result of my example is:
example: blabla,blabla,Test#Testing.com,blabla,blabla,blabla
result : Test#
How can I get only Test without #.
The simplest way is:
([A-Za-z /0-9._%+-]+)#
and than use at what you taken ($1 in perl, match var in tcl, etc.)
btw,
I didn't know email addresses can have spaces in them, are you sure you're not taking too much in?
Edit:
here's a little tutorial on lookaheads (supporting Wiktor's comment)
http://www.regular-expressions.info/lookaround.html
I use the following regex pattern for validating the email address that works fine.
/^[^\#]+#.*\.[a-z]{2,6}$/i
But the problem is I want to generate error if email like abc.abc#yahoo.com are enter. Actually all those emails are invalid which have characters same as before & after **.** like xyz.xyz#gmail.com invalid, qwe.qwe#hotmail.com invalid
You can use a back-reference in a regexp to test if two parts are the same. In PHP you'd write:
if (preg_match('/^(\w+)\.\1#.*/', $email)) {
echo "That's a spammy name";
}
I am supposed to validate comma separated email addresses and avoid invalid patterns like
email..email#domain.com,
.email#domain.com,
email#domain.web,
email.#domain.com,
email#-domain.com,
email#domain.web,
email#111.222.333.44444
currently I am using following Regular expression
regex = /^((\w+([-+.']\w+)*#\w+([-.]\w+)*\.([a-zA-Z])+([-.]\w+)*)*([,])*)*$/
(custom regex rule as in validation-engine)
For which I can not use email#domain99.com which can be a valid email address in my case
Please suggest me suitable answer!
EDIT- regex = /^((\w+([-+.']\w+)*#\w+([-.]\w+)*\.([a-zA-Z])+([-.]\w+)*)* ([,])*)*$/ this expression miserably failed when I used ,, instead of , to separate the values. Suggest a way please.
I'm trying to modify the url-matching regex at http://daringfireball.net/2010/07/improved_regex_for_matching_urls to not match anything that's already part of a valid URL tag or used as the link text.
For example, in the following string, I want to match http://www.foo.com, but NOT http://www.bar.com or http://www.baz.com
www.foo.com http://www.baz.com
I was trying to add a negative lookahead to exclude matches followed by " or <, but for some reason, it's only applying to the "m" in .com. So, this regex still returns http://www.bar.co and http://www.baz.co as matches.
I can't see what I'm doing wrong... any ideas?
\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))(?!["<])
Here is a simpler example too:
((((ht|f)tps?:\/\/)|(www.))[a-zA-Z0-9_\-.:#/~}?]+)(?!["<])
I looked into this issue last year and developed a solution that you may want to look at - See: URL Linkification (HTTP/FTP) This link is a test page for the Javascript solution with many examples of difficult-to-linkify URLs.
My regex solution, written for both PHP and Javascript - is not simple (but neither is the problem as it turns out.) For more information I would recommend also reading:
The Problem With URLs by Jeff Atwood, and
An Improved Liberal, Accurate Regex Pattern for Matching URLs by John Gruber
The comments following Jeff's blog post are a must read if you want to do this right...
Note also that John Gruber's regex has a component that can go into realm of catastrophic backtracking (the part which matches one level of matching parentheses).
Yeah, its actually trivial to make it work if you just want to exclude trailing characters, just make your expression 'independent', then no backtracking will occurr in that segment.
(?>\b ...)(?!["<])
A perl test:
use strict;
use warnings;
my $str = 'www.foo.com http://www.baz.comhttp://www.some.com';
while ($str =~ m~
(?>
\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
)
(?!["<])
~xg)
{
print "$1\n";
}
Output:
www.foo.com
http://www.some.com
I was using a regular expression for email formats which I thought was ok but the customer is complaining that the expression is too strict. So they have come back with the following requirement:
The email must contain an "#" symbol and end with either .xx or .xxx ie.(.nl or .com). They are happy with this to pass validation. I have started the expression to see if the string contains an "#" symbol as below
^(?=.*[#])
this seems to work but how do I add the last requirement (must end with .xx or .xxx)?
A regex simply enforcing your two requirements is:
^.+#.+\.[a-zA-Z]{2,3}$
However, there are email validation libraries for most languages that will generally work better than a regex.
I always use this for emails
^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}" +
#"\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\" +
#".)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
Try http://www.ultrapico.com/Expresso.htm as well!
It is not possible to validate every E-Mail Adress with RegEx but for your requirements this simple regex works. It is neither complete nor does it in any way check for errors but it exactly meets the specs:
[^#]+#.+\.\w{2,3}$
Explanation:
[^#]+: Match one or more characters that are not #
#: Match the #
.+: Match one or more of any character
\.: Match a .
\w{2,3}: Match 2 or 3 word-characters (a-zA-Z)
$: End of string
Try this :
([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4})\be(\w*)s\b
A good tool to test our regular expression :
http://gskinner.com/RegExr/
You could use
[#].+\.[a-z0-9]{2,3}$
This should work:
^[^#\r\n\s]+[^.#]#[^.#][^#\r\n\s]+\.(\w){2,}$
I tested it against these invalid emails:
#exampleexample#domaincom.com
example#domaincom
exampledomain.com
exampledomain#.com
exampledomain.#com
example.domain#.#com
e.x+a.1m.5e#em.a.i.l.c.o
some-user#internal-email.company.c
some-user#internal-ema#il.company.co
some-user##internal-email.company.co
#test.com
test#asdaf
test#.com
test.#com.co
And these valid emails:
example#domain.com
e.x+a.1m.5e#em.a.i.l.c.om
some-user#internal-email.company.co
edit
This one appears to validate all of the addresses from that wikipedia page, though it probably allows some invalid emails as well. The parenthesis will split it into everything before and after the #:
^([^\r\n]+)#([^\r\n]+\.?\w{2,})$
niceandsimple#example.com
very.common#example.com
a.little.lengthy.but.fine#dept.example.com
disposable.style.email.with+symbol#example.com
other.email-with-dash#example.com
user#[IPv6:2001:db8:1ff::a0b:dbd0]
"much.more unusual"#example.com
"very.unusual.#.unusual.com"#example.com
"very.(),:;<>[]\".VERY.\"very#\\ \"very\".unusual"#strange.example.com
postbox#com
admin#mailserver1
!#$%&'*+-/=?^_`{}|~#example.org
"()<>[]:,;#\\\"!#$%&'*+-/=?^_`{}| ~.a"#example.org
" "#example.org
üñîçøðé#example.com
üñîçøðé#üñîçøðé.com