How to write a conditional in regex - regex

I have the follow line of regex (javascript)
/^[a-z0-9_.\-]+#(yahoo|gmail|excite})\.com$/
However, I am unsure of how to make this include subdomains (IF one is present).
So this expression should match uk.yahoo.com and yahoo.com email address as well... How can this be done?

Well, if you want just the subdomain uk.yahoo.com:
/^[a-z0-9_.\-]+#((?:uk\.)?yahoo|gmail|excite)\.com$/
The addition of (?:uk\.)? specifies a optional noncapturing group that matches either 0 or 1 occurrence of the pattern "uk.".
However, using regexes to validate email addresses is an awful idea. RFC2822 is a very complex standard. It's much better to blindly send an email to whatever minimally-validated address the user enters, fail early, and give the user a chance to correct the mistake.

Related

Improve exim regex to catch everything but specified adresses

I'm using this regex to catch any incoming e-mails excluding mails from from specific people.
^(.(?!(zulgrib#exemple.com|zulgrib#example.org)).)*$/i
This regex correctly let through these scenarios
Zulgrib at example.com <Zulgrib#example.com>
<Zulgrib#example.com>
<Zulgrib#example.com> In behalf of Robot
Regex correctly catches these kind of headers
Associate#example.org
Your Associate Associate#example.com
If an excluded e-mail address is alone, it will catch it, I would like to prevent that. Example:
zulgrib#exemple.org
What should be modified to allow this to work and why my current method is not correct ?
If I understand the documentation, . matches any character, void is not a character, but using * is not working.
First, some issues in your current regex:
exemple has a different spelling than example
Literal points need to be escaped. So \.com instead of .com.
There are two dots (.) in the outermost group, which means you only capture text with an even number of characters, and don't exclude the case where the email addresses start at the beginning of the string. The first dot should not be there.
To make an exception for when the email address is the only thing in the input, I fear you'll have to specify that as a separate alternative in which (unfortunately) you'll have to repeat those email addresses:
^(?:zulgrib#example\.com|zulgrib#example\.org)$|^(?!(?:.*(?:zulgrib#example\.com|zulgrib#example\.org))).*$

validate email addresses using a regex. [duplicate]

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 7 years ago.
I am trying to validate email addresses using a regex. This is what I have now ^([\w-.]+)#([\w-]+).(aero|asia|be|biz|com.ar|co.in|co.jp|co.kr|co.sg|com|com.ar|com.mx|com.sg|com.ph|co.uk|coop|de|edu|fr|gov|in|info|jobs|mil|mobi|museum|name|net|net.mx|org|ru)*$ I found many solutions using non-capturing groups but did not know why. Also can you tell me if this is the correct regex and also if there are any valid emails which are not being validated correctly and vice-versa
Don’t bother, there are many ways to validate an email address. Ever since there are internationalized domain names, there’s no point in listing TLDs. On the other hand, if you want to limit your acceptance to only a selection of domains, you’re on the right track. Regarding your regex:
You have to escape dots so they become literals: . matches almost anything, \. matches “.”
In the domain part, you use [\w-] (without dot) which won’t work for “#mail.example.com”.
You probably should take a look at the duplicate answer.
This article shows you a monstrous, yet RFC 5322 compliant regular expression, but also tells you not to use it.
I like this one: /^.+#.+\...+$/ It tests for anything, an at sign, any number of anything, a dot, anything, and any number of anything. This will suffice to check the general format of an entered email address. In all likelihood, users will make typing errors that are impossible to prevent, like typing john#hotmil.com. He won’t get your mail, but you successfully validated his address format.
In response to your comment: if you use a non-capturing group by using (?:…) instead of (…), the match won’t be captured. For instance, all email addresses have an at sign, you don’t need to capture it. Hence, (john)(?:#)(example\.com) will provide the name and the server, not the at sign. Non-capturing groups are a regex possibility, they have nothing to do with email validation.

Regular expression - for email spam filtering, match email address variants other than the original

I am a email spam quarantine administrator and I can write regular expression rules to block email messages. There is a common classification of email spam hitting our domain such that the username of any of our email addresses is spoofed in front of some other domain.
For example, suppose my email address is jwclark#domain.com. In that case, spammers are writing to me from all kinds of other domains that start with my username such as:
jwclark1234#whatever.com
jwclark#wrongdomain.com
jwclark#a.domain.com
How can I write a regular expression rule to match everything including jwclark and any wildcards, but not match the original jwclark#domain.com? I would like a regex that matches everything above except for my actual example email address jwclark#domain.com.
I've made this regexp here
^jwclark.*[#](?!domain\.com).*$
it's in javascript format, but it should be easy to adapt to php or something else.
Given the nature of your problem, you might be better off making a regex builder function that makes the proper regexp for you, given the parameters.
Or, actually use a different approach. I recently found out how to parse ranges of floating point numbers with regexp, but that doesn't make it the proper solution to finding numbers within ranges. :P
edit - fixed silly redundancy thanks to zx81
edit - change to comply with strange limitations:
^jwclark.{0,25}[#][^d][^o][^m][^a][^i][^n].{0,25}\.com.{0,25}$
demo for the strange one

How to only match before the first dot?

I have the following regex.
^((?!example).)*$#Subdomain is reserved (example).
I would like to validate <subdomain>.example.org. However, since the domain name contains example, a match is occurring.
The validation should not match when the address is www.example.org
The validation should match when the address is example.example.org
Looks like you're missing the escape character from the period
^(example)\..*$
should work
It seems that a simple
^example\.
is enough. Or use string methods, depending on your language:
url.indexOf('example.') === 0
If input such as example.org is also possible, you can use
^example\..+\.
to force the appearance of two dots. But this would still fail for example.co.uk. It depends on your input.
A simple way might be to break it up into two:
^.+\.example\.org$
^(www)?\.example\.org$
If 1) matches and 2) does not, it's a subdomain of example.org; otherwise, it's not. (Although www technically is a subdomain, but you understand.)

CAtlRegExp for a regular expression that matches 4 characters max

Short version:
How can I get a regex that matches a#a.aaaa but not a#a.aaaaa using CAtlRegExp ?
Long version:
I'm using CAtlRegExp http://msdn.microsoft.com/en-us/library/k3zs4axe(VS.80).aspx to try to match email addresses. I want to use the regex
^[A-Z0-9._%+-]+#(?:[A-Z0-9-]+\.)+[A-Z]{2,4}$
extracted from here.
But the syntax that CAtlRegExp accepts is different than the one used there. This regex returns the error REPARSE_ERROR_BRACKET_EXPECTED, you can check for yourself using this app: http://www.codeproject.com/KB/string/mfcregex.aspx
Using said app, I created this regex:
^[a-zA-Z0-9\._%\+\-]+#([a-zA-Z0-9-]+\.)+[a-zA-Z]$
But the problem is this matches a#a.aaaaa as valid, I need it to match 4 characters maximum for the op-level domain.
So, how can I get a regex that matches a#a.aaaa but not a#a.aaaaa ?
Try: ^[a-zA-Z0-9\._%\+\-]+#([a-zA-Z0-9-]+\.)+\c\c\c?\c?$
This expression replaces the [A-Z]{2,4} sequence which CAtlRegExp doesn't support with \c\c\c?\c?
\c serves as an abbreviation of [a-zA-Z]. The question marks after the 3rd and 4th \c's indicate they can match either zero or one characters. As a result, this portion of the expression matches 2, 3 or 4 characters, but neither more nor less.
You are trying to match email addresses, a very widely used critical element of internet communication.
To which I would say that this job is best done with the most widely used most correct regex.
Since email address format rules are described by RFC822, it seems useful to do internet searches for something like "RFC822 email regex".
For Perl the answer seems to be easy: use Mail::RFC822::Address: regexp-based address validation
RFC 822 Email Address Parser in PHP
Thus, to achieve the most correct handling of email addresses, one should either locate the most precise regex that there is out somewhere for the particular toolkit (ATL in your case) or - in case there's no suitable existing regex yet - adapt a very precise regex of another toolkit (Perl above seems to be a very complete albeit difficult candidate).
If you're trying to match a specific sub part of email addresses (as seems to be the case given your question), then it probably still makes sense to start with the most up-to-date/correct/universal regex and specifically limit it to the parts that you require.
Perhaps I stated the obvious, but I hope it helped.