Regex for matching a mix of delimited tokens and email address - regex

I need to validate a string for valid email addresses or specific tokens (later to be replaced by email addresses) or a mix of both delimited by semi-colon. I need a little help with this regex I nearly got working.
It matches the tokens but not the email address at the start or end.
^(((<#a#>)+|[;])*|(([a-zA-Z0-9_\-\.]+)#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5}){1,25})+([;.](([a-zA-Z0-9_\-\.]+)#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5}){1,25})+)*|((<#a#>)+|[;])*)$

Here is the answer posted by AlexBay which is working with my test data.
^((<#a#>)+|[;]|(([a-zA-Z0-9_\-\.]+)#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5}){1,25})‌​+)+$

Related

Prevent multiple emails in one line in Google Forms using regex

I have a long-form field ("Paragraph" type) in a Google Form. Users are expected to fill in any number of email addresses - at least one email, could be as many as 20-50 email addresses for some users.
I want to make sure that:
Each line is likely to be a valid email (by checking for a "#" character and a "." character)
Each line contains ONLY ONE email (by checking for "#" characters not separated by line breaks)
I know I can use the following string to check for two valid email addresses separated by a line break:
[a-zA-Z0-9_\.\+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-\.]+\n+[a-zA-Z0-9_\.\+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-\.]
However, this limits the user to submitting two (no more, no less) email addresses.
Is there a way to check for 1 email address per line, and allow anything from 1 to multiple emil addresses?
You could write the pattern with anchors and repeating 1 or more newlines followed by the same pattern.
^[\w.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+(?:\n+[\w.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+)*$
See a regex101 demo

Get an exact regex match of an email value from a list of email addresses

I have a text field which stores a list of email addresses e.g: x#demo.com; a.x#demo.com. I have another text field which stores the exact value matched from the list of emails i.e. if /x#demo.com/i is in x#demo.com;a.x#demo.com then it should return x#demo.com.
The issue I am having is that if I have /a.x#demo.com/i, I will get x#demo.com instead of a.x#demo.com
I know of the regex expression /^x#demo.com$/i, but this means I can only have one email in my list of email addresses which won't help.
I have tried a couple of other regex expressions with no luck.
Any ideas on how I can achieve this?
You can use this slightly changed regex:
/(^|;)x#demo.com($|;)/i
It will match from either beginning of string or start after a semi colon and end either at end of string or at a semi colon.
Edit:
Small change, this uses look behind and look forward, then you will only get the match, you want:
(?<=^|;)x#demo.com(?=$|;)
Edit2:
To allow Spaces around the semi colon and at start and end, use this (#-quoted):
#"(?<=^\s*|;\s*)x#demo.com(?=\s*$|\s*;)"
or use double escaping:
"(?<=^\\s*|;\\s*)x#demo.com(?=\\s*$|\\s*;)"

Multiple Email validation in a single input field separated by ;

Currently i am writing a software where a user can input more than one email in a input field separated by: ";"
Now i have a regex that validates the email but sadly enough doesn't work when i have more Emails in the input field when using the separation.
Has anyone ever created such a regex or is there anyone that is able to help me?
Thanx in advance and looking forward for a response.
Here is my Regex:
[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,4}+(\;|)
Just put the pattern which matches the following emails inside a non-capturing group with a preceding ; and make it to repeat zero or more times.
^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,4}+(?:;[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,4}+)*$
And one more thing is, you need to escape the dot.

fetch email address that does not contain certain words from webpage

Please tell me how can I extract emails (1st match) that does not contain words 'ajax' and 'gif' within email address. Here's the basic regex that I have:
match = re.findall('([\w.-]+#[\w.-]+\.\w+)', q.read(), re.I)[0]
Thanks,
I think you want something like this,
\b(?!\S*(?:ajax|gif)\S*)([\w.-]+#[\w.-]+\.\w+)\b
DEMO

How to detect that a certain string is not an email address but a twitter id?

Is there a way to differentiate between an email address and a twitter id?
Both use the '#' character and the email regex will be contained by the twitter id regex.
What's the best way to approach this?
Should I require a whitespace before the '#' character in order to identify that it's a twitter id?
Not entirely sure which characters are allowed in twitter usernames, but basically like so:
/(?:^|\s)#[a-zA-Z0-9_.-]+\b/
You can test that it's preceded by whitespace using (?<=\s) and then check for the valid characters of twitter IDs which are only [A-Za-z0-9_].
That gives you a resulting regex of: (?<=\s|^)#[A-Za-z0-9_]+
You could eventually add a check for a dot, comma or whitespace after it to check that it's properly formatted within a sentence and not some weird artifact:
(?<=\s|^)#[A-Za-z0-9_]+(?=[\s.,])
Note that the lookbehind and lookahead (?<= and ?=) might not work in your language of choice, but I'll assume it does since you didn't specify.
Email addresses never start with an #, while twitter ids always do.
isTwitter = address[0] == '#'
A twitter id wouldn't pass an email regex check.
Regular email:
^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}$
twitter won't have the last characters:
^#[A-Za-z0-9_]+$
So check if it's a valid email, if not, check if it's a valid twitter ID
Farther reading:
How to Find or Validate an Email Address