Altering Regex to allow apostrophe in email address - regex

At work, our current ValidationExpression looks horrible and very confusing for me. We are using the WebForms <asp:RegularExpressionValidator> user control which looks like this:
<asp:RegularExpressionValidator ID="regEmail" runat="server"
ValidationGroup="EditEmails"
Text="*" ErrorMessage="Invalid email address."
ControlToValidate="txtAdd"
Display="Dynamic"
ValidationExpression="^(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+#((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,6}$"/>
I need to somehow alter this to allow apostrophes ( ' ) inside an email because at the moment this expression is failing.
Example of an email that needs to pass this validation: Test.O'neill#example.co.uk
I am unsure what the expression does but I'm sure this could be made shorter (maybe not simpler but that does not matter as long as it works).
Anyone know of a better regular expression I could use which works against valid emails and takes this into consideration? Thank you!
EDIT: My question is different because the proposed duplicate question does not work for VB.Net RegularExpression Validator user control.

See regex in use here
^([a-z\d]+[_+.-])*[a-z\d']+#(\w+[.-])*\w{1,63}\.[a-z]+$
^ Assert position at the start of the line
([a-z\d]+[_+.-])* Capture the following any number of times
[a-z\d]+ Match any ASCII letter or digit one or more times (also matches uppercase variants with i flag enabled)
[_+.-] Match any character in the set
[a-z\d']+ Match any ASCII letter, digit, or apostrophe one or more times
# Match this literally
(\w+[.-])* Capture the following any number of times
\w+ Match any word character one or more times
[.-] Match any character in the set
\w{1,63} Match any word character between one and 63 times
\. Match a literaly dot .
[a-z]+ Match any ASCII letter one or more times
$ Assert position at the end of the line
To implement the above pattern in a case-insensitive manner, add the RegexOptions.IgnoreCase flag. For more information see this post.

Related

Regular Expression for email formatting without hypen at first and last

I have created the regular expression which will take the email address as in following format:
abc#xyz.com.in
Regular Expression
/^(?!-)[\w-\.]+#([\w-]+\.)+[\w-]{2,4}/
I am trying to do the email which is not having hyphen at start and last.
Invalid Format
-abc#xyz.com
abc#xyz.com-
valid format
abc#xyz.com
abc#xyz.com.in
Your regex can be edited in a simple way (see a demo at Regex101):
/^[\w\.]+[\w\.\-]*#[\w\.]+\.[\w\.]{2,4}$/
^: This is the beginning of the line
[\w\.]+: This is the first part of the email before # can have only word characters (\w) or dot (\.) at least once.
[\w\.\-]*: After that, the same characters from the list before can occur including the dash (\-) and as many times as you want. Remember, the dash has to be escaped if used in the list between [ and ], otherwise it represents a range instead of the dash itself.
#: This matches itself.
[\w\.]+: After the #` character, there must be at least one character from the list.
\.: Then followed by the dot literally.
[\w\.]{2,4}: Finally the last 2-4 characters.
$: And the end of a line.
The difference between this and your Regex is just a little:
/^[\w\.]+[\w\.\-]*#[\w\.]+\.[\w\.]{2,4}$/
/^(?!-)[\w-\.]+#([\w-]+\.)+[\w-]{2,4}/
I rather avoided the negative look-ahead and specify (whitelist) the characters that can occur on the position, unless it is really needed to blacklist them (which I generally try to avoid). The rest of the Regex is quite similar except you should escape the dash - character between the list braces [ and ].
Finally, I omitted the capturing groups ( and ) and leave it up to you to place them wherever you need.
Add \w to each end of your regex, and include the end anchor$
^\w[\w.-]+#([\w-]+\.)+[\w-]{2,4}\w$
Note also the dot doesn't need escaping within a character class.
a complete email RegEx
/^(([^<>()[\]\\.,;:\s#"]+(\.[^<>()[\]\\.,;:\s#"]+)*)|(".+"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

No period in first part of regular expression

This is what I'm currently working with:
((?i)(\w|^){0,25}[0-9]{3})[^\.]*#(gmail)\.com
What I'm attempting to do is block any email that is any amount of characters but with 3 numbers trailing the characters.
This works. HOWEVER, when Google creates a username for people, it usually chooses firstname.lastname####gmail.com. I don't want an email with a period before the #gmail.com to be included.
I have played and played with this expression, and I can't get it. So for example john.doe123#gmail.com, the expression is tagging everything after the period. I need for the regex to check the ENTIRE email and check to see if it follows the expression. I know there is this tidbit ^[^\.]*$ but I have no idea where to put it.
You could match 0-25 word characters followed by 3 digits \w{0,25}[0-9]{3} and use anchors to assert the start ^ and the end $ of the string.
^\w{0,25}[0-9]{3}#gmail\.com$
Regex demo
If you want to make use of the negated character class [^ you could match 0-25 times matching any char except a whitespace char, # or a dot followed by 3 digits using [^\s#.]{0,25}[0-9]{3}
^[^\s#.]{0,25}[0-9]{3}#gmail\.com$
Regex demo

Regex about url encoded string

Would like to write one regex to get the url encoded string in below line:
<topicref href="%E4%BA%B0.txt"/>
When I used a regex like (%[A-Z][0-9])+\.txt it only got %B0.txt. What can I do if I want to get the whole url encoded string such like %E4%BA%B0.txt.
Thanks a lot.
Proper URL encoding uses hex digits only, A-F not A-Z. The encoded URL could contain non-encoded characters anywhere. Also, you should escape the full stop.
((%[0-9A-F]{2}|[^<>'" %])+)\.txt
is a quick ad-hoc fix for your regex, though obviously for any production code, probably don't use a regex for this at all, or at the very least try a well-defined and properly tested URL regex like the one you can find in the HTTP RFC.
Putting the + quantifier outside the capturing parentheses will only return the last repetition. I added a second set of parentheses to put the quantifier inside the first capture group, which assumes you are doing something to extract the first capture group in particular. (If your regex dialect has non-capturing groups, you could change the second opening parenthesis to non-capturing, i.e. (?:.)
You need to change your regex to
([%\dA-Z]+)\.txt
([%\dA-Z]+) - Match %, digits and alphabets one or more time
\.txt - Match .txt
where as your regex means
(%[A-Z][0-9])+.txt
(%[A-Z][0-9])+
% - Match %
[A-Z] - Match A to Z one time
[0-9] - Match any digit one or more time
+ - Match the captured group one or more time
.txt - Match single character (anything except new line) followed by txt

RegEx more than multiple characters before number

I really don't use RegEx that much. You could say I am RegEx n00b. I have been working on this issue for a half a day.
I am trying to write a pattern that looks backward from a number character. For example:
1. bob1 => bob
2. cat3 => cat
3. Mary34 => Mary
So far I have this (?![A-Z][a-z]{1,})([A-Za-z_])
It only matches for individual characters, I want all the characters before the number character. I tried to add the ^ and $ into my pattern and using an online simulator. I am unsure where to put the ^ and $.
NOTE: I am using RegEx for the .NET Framework
You may use a regex like
[\p{L}_]+(?=\d)
or
[\w-[\d]]+(?=\d)
See the regex demo
Pattern details
[\p{L}_]+ - any 1 or more letters (both lower- and uppercase) and/or _
OR
[\w-[\d]]+ - 1 or more word chars except digits (the -[] inside a character class is a character class subtraction construct)
(?=\d) - a positive lookahead that requires a digit to appear immediately to the right of the current location
If we break down your RegEx, we see:
(?![A-Z][a-z]{1,}) which says "look ahead to find a string that is NOT one uppercase letter followed one or more lowercase letters" and ([A-Za-z_]) which says "match one letter or underscore". This should end up matching any single lowercase letter.
If I understand what you want to achieve, then you want all of the letters before a number. I would write something like that as:
\b([a-zA-Z]+)[0-9]
This will start at a word boundary \b, match one or more letters, and require a digit right after the matched string.
(The syntax I used seems to match this document about .NET RegEx: https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions)
In light of Wiktor Stribizew's comment, here is a pure match RegEx:
\b[a-zA-Z_]+(?=[0-9])
This matches the pattern and then looks ahead for the digit. This is better than my first lookahead attempt. (Thank you Wiktor.)
http://www.rexegg.com/regex-lookarounds.html

regex match till a character from a second occurance of a different character

My question is pretty similar to this question and the answer is almost fine. Only I need a regexp not only for character-to-character but for a second occurance of a character till a character.
My purpose is to get password from uri, example:
http://mylogin:mypassword#mywebpage.com
So in fact I need space from the second ":" till "#".
You could give the following regex a go:
(?<=:)[^:]+?(?=#)
It matches any consecutive string not containing any : character, prefixed by a : and suffixed by a #.
Depending on your flavour of regex you might need something like:
:([^:]+?)#
Which doesn't use lookarounds, this includes the : and # in the match, but the password will be in the first capturing group.
The ? makes it lazy in case there should be any # characters in the actual url string, and as such it is optional. Please note that that this will match any character between : and # even newlines and so on.
Here's an easy one that does not need look-aheads or look-behinds:
.*:.*:([^#]+)#
Explanation:
.*:.*: matches everything up to (and including) the second colon (:)
([^#]+) matches the longest possible series of non-# characters
# - matches the # character.
If you run this regex, the first capturing group (the expression between parentheses) will contain the password.
Here it is in action: http://regex101.com/r/fT6rI0