Pattern matching email address using regular expressions [duplicate] - regex

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 2 years ago.
Filter email address with regular expressions: I am new to regular expressions and was hoping someone might be able to help out.
I am trying to pattern match an email address string with the following format:
FirstName.LastName#gmail.com
I want to be sure that there is a period somewhere before the '#' character and that the characters after the '#' character matches gmail.com

You want some symbols before and after the dot, so I would suggest .+\..+#gmail\.com.
.+ means any symbols (.) can appear 1 or more times (+)
\. means the dot symbol; screened with backslash to suppress the special meaning of .
#gmail and com should be matched exactly.
See also Regular Expression Basic Syntax Reference
EDIT: gmail rules for account name only allow latin letters, digits, and dots, so a better regex is
[a-zA-Z0-9]+\.[a-zA-Z0-9]+#gmail\.com

check valid email
^(?:(?!.*?[.]{2})[a-zA-Z0-9](?:[a-zA-Z0-9.+!%-]{1,64}|)|\"[a-zA-Z0-9.+!% -]{1,64}\")#[a-zA-Z0-9][a-zA-Z0-9.-]+(.[a-z]{2,}|.[0-9]{1,})$
enforced rules:
must start with alphanumeric char
can only have alphanumeric and #._-% char
cannot have 2 consecutives . exept for quoted string
char before # can only be alphanumeric and ._-%, exept for quoted string
must have # in the middle
need to have at least 1 . in the domain part
cannot have double - in the domain part
can only have alphanumeric and .- char in the domain part
need to finish by a valid extension of 2 or more letters
support IP address (test#1.1.1.1)
support for quoted user name

You don't even need regex since your requirements are pretty specific. Not sure what language you're using, but most would support doing a split on # and checking for a .. In python:
name, _, domain = email.partition('#')
if '.' in name and domain == 'gmail.com':
# valid

You haven't tell us what kind of regex flavor you need however this example will fit most of them:
.*\..*#gmail.com

Assuming Unix style where . is any character: .*\..*#gmail\.com
Edit: escaped the . in the domain

I used follwing regex expression to validate the email address. Also I have added a small code snippet in C# language regarding to that.
Regex - "^[a-zA-Z0-9]{3,20}#gmail.com$"
Code :-
static void Main(string[] args)
{
Console.WriteLine("Enter Your Text ");
string input = Console.ReadLine();
Console.WriteLine("The Text that you have entered is :" + input);
Console.ReadLine();
string pattern = "^[a-zA-Z0-9]{3,20}#gmail.com$";
Regex regex = new Regex(pattern);
bool results = regex.IsMatch(input);
Console.WriteLine(results.ToString());
Console.ReadLine();
}
Emails such as ∂øøµ$∂å¥!#gmail.com also checked and show as false here. ;-)

Related

Regex to Match a single or multiple email addresses separated by comma

I need to have a regex pattern that matches the following kind of string
#keyword1 a#b.com or #keyword2 a#b.com;b#c.com;d#e.com
The following regex pattern doesn't do exactly what I want:
/(#)(?:keyword1|keyword2)\s([a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)/g
The above regex expression only matches #keyword1 a#b.com correctly.
But for the second it matches everything before the first semicolon. I need it to match the entire thing. How can I do that please?
I would suggest parsing the string in two steps. First distinguish the keyword from the array of email addresses and then split the array.
First retrieve both the keyword and the arrray, assuming that is all that the string consists of. I'm using the JavaScript RegExp notation, but you should be able to understand what is happening.
Assume the string is "#keyword2 a#b.com;b#c.com;d#e.com".
/^#(keyword1|keyword2) (.*)$/g
Group 1 will be "keyword2" and group 2 will be "a#b.com;b#c.com;d#e.com". Now apply the following pattern to group 2 and loop through the matches to retrieve each email address.
/([^;]*)(?:;|$)/g
This pattern makes no assumptions about whether or not the email addresses are properly formatted, just that they are separated by a semicolon. This also works if there's only a single email address.

RegEx to find or match an upper-domain that is followed by a specific word (anchor)

I have a text block, or string:
(Ex.1) domain of doorstops-scripts.asjewelries.com designates 88.198.68.211 as permitted sender
(Ex.2) domain of aiceo.net designates 193.105.73.148 as permitted sender
I would like to match/find the upper domain: asjewelries.com or aiceo.net).
That is .com, .net, .info, .tv, etc. (2-5 chars preceded by a dot) with either preceeding chars following a dot (asjewelries in Ex.1) or if there are no lower domains following a space (aiceo in Ex.2)
Both domains are followed by a specific word (anchor): designates. They could be followed immediately by this anchor or there may be other words between.
this is the goal:
asjewelries.com (Ex.1)
aiceo.net (Ex.2)
I would like to match/find the upper domain (asjewelries.com or aiceo.net) only
How about matching everything after "domain of" and up to the first space? It would narrow down false positives.
domain of (?:\S+\.)?([^ .]+\.[^ .]{2,5})
Notice how I'm using a ( group ) to create a backreference for the domain. Everything else in that pattern matches any character except spaces and uses the "." to fetch the domain.
Alternatively, if the text "domain of" varies, you can simply remove it:
\s(?:\S+\.)?([^ .]+\.[^ .]{2,5})\s
The important thing is that both domains are followed by
a specific word (anchor) that in this case is the word "designates".
They could be followed immediately by this anchor
or there may be other words in between.
word "designate" does not immediately follow the domain
Easy, just check for the occurrence of the word "designate" or "designates" by matching it literally in the pattern.
Regex:
domain of (?:\S+\.)?([^ .]+\.[^ .]{2,5}) .*?\bdesignates?\b
Important: Get the text returned by the first backreference (1st group). In AHK it should be returned by match[1].
Test it online
Output:
"asjewelries.com"
"aiceo.net"
I'm positive you could create these kind of expressions if you read a couple of minutes about regex syntax, so allow me to recommend:
Regular Expressions Tutorial (regular-expressions.info). A quite comprehensive tutorial to learn regex.
regex101.com. Allows you to test different expressions and understand the way a pattern matches the subjet string.
You can use the following regex:
\w*[a-zA-Z]\w*\.\w{2,5}(?=\s)
See RegEx DEMO
public static void main(String[] args) {
String s = " Ex. 1) domain of doorstops-scripts.asjewelries.com designates 88.198.68.211 as permitted sender Ex. 2) domain of aiceo.net";
Pattern pattern = Pattern.compile("((Ex\\.\\s1\\)).*\\.(asjewelries\\.\\w{1,4}).*(Ex\\.\\s2\\)).*(aiceo.\\w{1,4}))");
Matcher matcher = pattern.matcher(s);
if ( matcher.find() ) {
System.out.println(matcher.group(3)+ " (" +matcher.group(2));
System.out.println(matcher.group(5)+ " (" + matcher.group(4));
}
}
should output
asjewelries.com (Ex. 1)
aiceo.net (Ex. 2)

VB.Net REGEX to strip email

I have a need to strip email addresses out of paragraphs of plain text. I have googled and search this site and found many suggestions - none of which I can get to work. I'm using code like this:
Imports System.Text.RegularExpressions
Dim strEmailPattern As String = "^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$"
Dim senText As String = "blah blah blah blah blah someone#somewhere.com"
Dim newText As String = String.Empty
newText = Regex.Replace(senText, strEmailPattern, String.Empty)
After the call to Regex.Replace the newText string still contains the complete senText string including the email. I thought it was the regex pattern I was using but I have tried many so maybe I'm am missing something in the code?
This posix regex should match all the emails, provided
they may not be valid
every email contains at least on #
there are sequences of characters around # symbols which includes alphabet, digits, hyphen and dots and not started by any non-alpha characters.
All emails are separated by at least a single space char.
Regex
([[:alpha:]][[:alnum:].-]+#)+[[:alpha:]][[:alnum:].-]+
This might also work
([a-zA-Z][[a-zA-Z0-9].-]+#)+[a-zA-Z][a-zA-Z0-9.-]+
A shorter version (as in comment) would be
(\w[\w.-]+#)+\w[\w.-]+
But this will match some more invalid emails.
The patter I am addressing will match most of the email addresses. if you really want to match all the RFC-822 compliant emails, consider using the pattern here. Its a 6425 character long regex that matches all the standard email address. But be ware, it'll execute slow!
There are various corner cases where your regex would fail..
you should use as simple as this
(?<=^|\s)[^#]+?\#[^#]+?(?=$|\s)

I want to modify this regex to include apostrophe

This regex is used for validating email addresses, however it doesn't include the case for apostrophy (') which is a valid character in the first part of an email address.
I have tried myself and to use some examples I found, but they don't work.
^([\w-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
How do I modify it slightly to support the ' character (apostraphy)?
Per the documentation for an email address, the apostrophe can appear anywhere before the # symbol, which, in your current regex is:
^([\w-\.]+)#
You should be able to add the apostrophe into the brackets of valid characters:
^([\w-\.']+)#
This would make the entire regex:
^([\w-\.']+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
EDIT (regex contained in single-quotes)
If you're using this regex inside a string with single-quotes, such as in PHP with $regex = '^([\w ..., you will need to escape the single-quote in the regex with \':
^([\w-\.\']+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
You need to update the first part as follows:
^([\'\w-\.]+)

Regular Expression for validating username

I'm looking for a regular expression to validate a username.
The username may contain:
Letters (western, greek, russian etc.)
Numbers
Spaces, but only 1 at a time
Special characters (for example: "!##$%^&*.:;<>?/\|{}[]_+=-") , but only 1 at a time
EDIT:
Sorry for the confusion
I need it for cocoa-touch but i'll have to translate it for php for the server side anyway.
And with 1 at a time i mean spaces or special char's should be separated by letters or numbers.
Instead of writing one big regular expression, it would be clearer to write separate regular expressions to test each of your desired conditions.
Test whether the username contains only letters, numbers, ASCII symbols ! through #, and space: ^(\p{L}|\p{N}|[!-#]| )+$. This must match for the username to be valid. Note the use of the \p{L} class for Unicode letters and the \p{N} class for Unicode numbers.
Test whether the the username contains consecutive spaces: \s\s+. If this matches, the username is invalid.
Test whether symbols occur consecutively: [!-#][!-#]+. If this matches, the username is invalid.
This satisfies your criteria exactly as written.
However, depending on how the usernames have been written, perfectly valid names like "Éponine" may still be rejected by this approach. This is because the "É" could be written either as U+00C9 LATIN CAPITAL E WITH ACUTE (which is matched by \p{L}) or something like E followed by U+02CA MODIFIER LETTER ACUTE ACCENT (which is not matched by \p{L}.)
Regular-Expressions.info says it better:
Again, "character" really means "Unicode code point". \p{L} matches a
single code point in the category "letter". If your input string is à
encoded as U+0061 U+0300, it matches a without the accent. If the
input is à encoded as U+00E0, it matches à with the accent. The reason
is that both the code points U+0061 (a) and U+00E0 (à) are in the
category "letter", while U+0300 is in the category "mark".
Unicode is hairy, and restricting the characters in usernames is not necessarily a good idea anyway. Are you sure you want to do this?
The expression
^(\w| (?! )|["!##$%^&*.:;<>?/\|{}\[\]_+=\-")](?!["!##$%^&*.:;<>?/\|{}\[\]_+=\-")]))*$
will mostly do what you want, if your dialect support look-ahead assertions.
See it in action at RegExr.
Please ask yourself why you want to limit usernames in this way. Most of the time usernames starting with "!!" should be not an issue, and you annoy users if you reject their desired username.
Edit: \w does not match non-latin characters. To do this, replace \w with \p{L} wich may, or may not work depending on your regex implementation. Regexr unfortunately does not support it.
Try this:
^[!##$%^&*.:;<>?\/\|{}\[\]_+= -]?([\p{L}\d]+[!##$%^&*.:;<>?/\|{}\[\]_+= -]?)+$
See on rubular
You want something like
string strUserName = "BillYBob Stev#nS0&";
Regex regex = new Regex(#"(?i)\b(\w+\p{P}*\p{S}*\p{Z}*\p{C}*\s?)+\b");
Match match = regex.Match(strUserName);
If you want this explaining, let me know.
I hope this helps.
Note: This is case insensitive.
Since I don't know in what language you need this solution, I am providing answer in Java. It can be translated in any other platform:
String str = "à123 àà#bcà#";
String regex = "^([\\p{L}\\d]+[!##$%\\^&\\*.:;<>\\?/\\|{}\\[\\]_\\+=\\s-]?)+$";
Pattern p = Pattern.compile(regex);
matcher = p.matcher(str);
if (matcher.find())
System.out.println("Matched: " + matcher.group());
One assumption I made is that username will start with either an unicode letter or a number.