Regular expression for email [duplicate] - regex

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
What is the best regular expression for validating email addresses?
I am using this particular regular expression for checking emails.
"^[A-Za-z0-9](([a-zA-Z0-9,=\.!\-#|\$%\^&\*\+/\?_`\{\}~]+)*)#(?:[0-9a-zA-Z-]+\.)+[a-zA-Z]{2,9}$"
The issue I have is that, this allows periods before the "#" symbol.
Is there any way this expression be modified so that this does not happen, yet all other conditions are maintained.
a.#b.com should be invalid
Thanks in advance
-Rollin

The best answer I've seen so far. Honestly, if you gave some indication of which language or toolset you were using, I would point you to the library that does it for you rather than telling you how to hand-roll a regular expression for this.
Edit: Given the additional information that this is on .NET, I would use the MailAddress class and abandon the thought of using regular expressions altogether like so:
public bool IsAddressValid(string text)
{
try
{
MailAddress address = new MailAddress(text);
return true;
}
catch (FormatException)
{
return false;
}
}
If there are additional requirements over and above validating the address itself (like making sure it is from a particular set of domains or some such) then you can do that with much simpler tests after you have verified that the address is valid as I suggested in another post.

A strange game. The only winning move
is not to play.
Seriously, the only winning way to validate email addresses with a regular expression is to not validate email addresses with a regular expression. The grammar that accepts email addresses is not regular. It's true that modern regular expressions can support some non-regular grammars, but even so, maintaining a regular expression for such a complex grammar is (as you can see) nearly impossible.
The only reasonable use of regular expressions with email addresses that I can think of is to do some minimal checking on the client side (does it contain an # symbol?).
What you should do is:
Send an email to the email address with a link for the user to click. If the user clicks the link, the email address is valid. Furthermore, it exists, and the user is probably the one who entered the email address into your form. Not only does this 100% validate the email address, it gives you even more guarantees.
If you can't do 1, use a prepackaged email validator. Better, use both 1 and 2.
If you can't do 1 or 2, write a real parser to validate email addresses.

you could put [^\.] before the # so that it will allow any character except the dot
of course this is probably not what you want, so you could just put a [] with any legal characters in it
just in case someone has a email name (i mean the part before the #) that is just 1 character, you might need to get creative with the |

If your regex engine has lookbehind assertions then you can just add a "(?<!\.)" before the "#".

If you're doing this in Perl - the following script is an example
my $string = 'name#domain.com';
if($string =~/(\w+#[a-zA-Z_]+?\.[a-zA-Z]{2,9})/)
{
print "gotcha!";
}
else
{
print "nope :(";
}
As you can see, the Perl regex character \w handles periods gracefully. If you change $string to "name.#domain.com" it will fail.

Related

validate email addresses using a regex. [duplicate]

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 7 years ago.
I am trying to validate email addresses using a regex. This is what I have now ^([\w-.]+)#([\w-]+).(aero|asia|be|biz|com.ar|co.in|co.jp|co.kr|co.sg|com|com.ar|com.mx|com.sg|com.ph|co.uk|coop|de|edu|fr|gov|in|info|jobs|mil|mobi|museum|name|net|net.mx|org|ru)*$ I found many solutions using non-capturing groups but did not know why. Also can you tell me if this is the correct regex and also if there are any valid emails which are not being validated correctly and vice-versa
Don’t bother, there are many ways to validate an email address. Ever since there are internationalized domain names, there’s no point in listing TLDs. On the other hand, if you want to limit your acceptance to only a selection of domains, you’re on the right track. Regarding your regex:
You have to escape dots so they become literals: . matches almost anything, \. matches “.”
In the domain part, you use [\w-] (without dot) which won’t work for “#mail.example.com”.
You probably should take a look at the duplicate answer.
This article shows you a monstrous, yet RFC 5322 compliant regular expression, but also tells you not to use it.
I like this one: /^.+#.+\...+$/ It tests for anything, an at sign, any number of anything, a dot, anything, and any number of anything. This will suffice to check the general format of an entered email address. In all likelihood, users will make typing errors that are impossible to prevent, like typing john#hotmil.com. He won’t get your mail, but you successfully validated his address format.
In response to your comment: if you use a non-capturing group by using (?:…) instead of (…), the match won’t be captured. For instance, all email addresses have an at sign, you don’t need to capture it. Hence, (john)(?:#)(example\.com) will provide the name and the server, not the at sign. Non-capturing groups are a regex possibility, they have nothing to do with email validation.

Regex expression to validate a list of email with ; delimiter at the end of each email address [duplicate]

This question already has answers here:
How can I validate an email address in JavaScript?
(79 answers)
Closed 8 years ago.
i found this regular expression :
"^(([a-zA-Z0-9_\\-\\.]+)#([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5}){1,25})+([;.](([a-zA-Z0-9_\\-\\.]+)#([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5}){1,25})+)*$"
which validates a list of email like : address1#gmail.com;adresse2#gmail.com
But i need to tweak it to validate in fact this sturcture :
address1#gmail.com;adresse2#gmail.com;
and also just one email address with this structure :
address1#gmail.com;
I also want to be able to validate email addresses containing + sign ,
for example validating :
address1#gmail;adress2#gmail.com;addres+3#gmail.com;
as a valid list of emails.
Thank you for your help.
do not abuse regular expression too much.
it's not worthy to spend a lot of time and effort writing something inefficient and hard to analyze.
if you know it's semicolon separated, i would provid following pseudocode:
A<-split email-list with ';'
valid<-True
foreach email in A
if email != "" and email doesn't match [\w\-\.+]+#([\w+\-]+\.)+[a-zA-Z]{2,5}
valid<-False
end
end
return valid
the regular expression [\w\-\.+]+#([\w+\-]+\.)+[a-zA-Z]{2,5} validates one email address. it's perl-compatible syntax.
it matches a-zA-Z_-.+ in the domain, and allows domain names with a-zA-Z- in it, and end with 2 to 5 a-zA-Z combination.
in the regex you provided, it matches domain name with ([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5}){1,25})+, it's odd. i don't think you should do it this way.
about the reason why i said you are abusing regex is that, even though the problem you want to solve can be solved in regex, but
it takes more than linear time to design and debug regex as it gets longer.
it takes more than linear time for long regex to execute.
it's hard for other people to understand what you attempt to do. it's kind of preventing people from modifying you code, if it's not what you want.
so, please, never try to solve problem like this using pure regex. it's not a programming language.
This regex will match email-id's based on your criteria.
(?![\W_])((?:([\w+-]{2,})\.?){1,})(?<![\W_])#(?![\W_])(?=[\w.-]{5,})(?=.+\..+)(?1)(?<![\W_])
Regard the semicolon separated email-id's it is best to split them based on semicolon and match each occurrence individually to avoid confusions.
You can take a look at the matches here.
Just split the whole string using ; character and match each element based on the following regex. It will take care of plus sign also
string pattern = " \b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b";
foreach(string email in emailString.Split(';')
{
if(Regex.IsMatch(email, pattern))
{
//do stuff
}
}
As others have said, first split on ;, then validate the addresses individually. If you are going to use a regex to validate email, at least use one that's been at least vaguely tested against both good and bad examples, such as those on fightingforalostcause.net , listed in this project, or discussed in this definitive question, of which this question is effectively a duplicate.

Regular expression - for email spam filtering, match email address variants other than the original

I am a email spam quarantine administrator and I can write regular expression rules to block email messages. There is a common classification of email spam hitting our domain such that the username of any of our email addresses is spoofed in front of some other domain.
For example, suppose my email address is jwclark#domain.com. In that case, spammers are writing to me from all kinds of other domains that start with my username such as:
jwclark1234#whatever.com
jwclark#wrongdomain.com
jwclark#a.domain.com
How can I write a regular expression rule to match everything including jwclark and any wildcards, but not match the original jwclark#domain.com? I would like a regex that matches everything above except for my actual example email address jwclark#domain.com.
I've made this regexp here
^jwclark.*[#](?!domain\.com).*$
it's in javascript format, but it should be easy to adapt to php or something else.
Given the nature of your problem, you might be better off making a regex builder function that makes the proper regexp for you, given the parameters.
Or, actually use a different approach. I recently found out how to parse ranges of floating point numbers with regexp, but that doesn't make it the proper solution to finding numbers within ranges. :P
edit - fixed silly redundancy thanks to zx81
edit - change to comply with strange limitations:
^jwclark.{0,25}[#][^d][^o][^m][^a][^i][^n].{0,25}\.com.{0,25}$
demo for the strange one

Validate Regex Input, preferably using Regex

I'm looking to have the (admin) user enter some pattern matching string, to give different users of my website access to different database rows, depending on if the text in a particular field of the row matches the pattern matching string against that user.
I decided on Regex because it is trivial to integrate into the MySQL statements directly.
I don't really know where to start with validating that a string is a valid regular expression, with a regular expression.
I did some searching for similar questions, couldn't see one. Google produced the comical answer, sadly not so helpful.
Do people do this in the wild, or avoid it?
Is it able to be done with a simple regex, or will the set of all valid regex need to be limited to a usable subset?
Validating a regex is an incredibly complex task. A regex would not be able to do it.
A simple approach would be to catch any errors that occur when you try to run the SQL statement, then report an appropriate error back to the user.
I am assuming that the 'admin' is a trusted user here. It is quite dangerous to give a non-trusted user the ability to enter regexes, because it is easy to attack your system with a regex that is constructed to take a really long time to execute. And that is before you even start to worry about the Bobby Tables problems.
in javascript:
input = "hello**";
try{
RegExp(input);
// sumbit the regex
}catch(err){
// regex is not valid
}
You cannot validate that a string contains a valid regular expression with a regular expression. But you might be able to compromise.
If you only need to know that only characters which are valid in regular expressions were used in the string, you can use the regex:
^[\d\w \-\}\{\)\(\+\*\?\|\.\$\^\[\]\\]*$
This might be enough depending on the application.

regex that matches email addresses with address tags [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is there a php library for email address validation?
How to validate an email in php5?
I am aware that there have been plenty of questions regarding email address regular expressions. My question has a specific requirement that I have been unable to find the answer to.
I need a regex that matches email addressas and allows for address tags, such as "testing+tags#gmail.com". Most regexes I have found fail on email addresses that contain a tag.
NOTE please do not point me to this link. I am looking for something practical, not perfect
EDIT I am aware of the existence of built-in validation in most web app frameworks. RoR, PHP, Django, etc all have it built in. Sometimes, though, for whatever reason, there is a special need. Maybe the user can't use validation. maybe they are writing their app in some obscure language that doesn't have built-in validation functions, or has them, but they are out of date. In that case, a regular expression is still useful
You could should use filter_var to validate email instead
var_dump(filter_var('bob#example.com', FILTER_VALIDATE_EMAIL));
Example for your case:
echo filter_var('bob+long#example.com', FILTER_VALIDATE_EMAIL) !== false? 'Valid': 'Invalid';
My personal favorite has always been this:
/\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
Another popular one is also
/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/
If you are doing this in PHP and if feasible for your problem, I would suggest using filter_var as otherwise suggested. This is merely a suggestion should you need a regular expression that is practical and understood to be imperfect.