Extract email and name with regex

Extract email and name with regex - regex

What would be the regular expressions to extract the name and email from strings like these?
johndoe#example.com
John <johndoe#example.com>
John Doe <johndoe#example.com>
"John Doe" <johndoe#example.com>
It can be assumed that the email is valid. The name will be separated by the email by a single space, and might be quoted.
The expected results are:
johndoe#example.com
Name: nil
Email: johndoe#example.com
John <johndoe#example.com>
Name: John
Email: johndoe#example.com
John Doe <johndoe#example.com>
Name: John Doe
Email: johndoe#example.com
"John Doe" <johndoe#example.com>
Name: John Doe
Email: johndoe#example.com
This is my progress so far:
(("?(.*)"?)\s)?(<?(.*#.*)>?)
(which can be tested here: http://regexr.com/?337i5)

The following regex appears to work on all inputs and uses only two capturing groups:
(?:"?([^"]*)"?\s)?(?:<?(.+#[^>]+)>?)
http://regex101.com/r/dR8hL3
Thanks to #RohitJain and #burning_LEGION for introducing the idea of non-capturing groups and character exclusion respectively.

use this regex "?([^"]*)"?\s*([^\s]+#.+)
group 1 contains name
group 2 contains email

(([^<>()\[\]\\.,;:\s#"]+(\.[^<>()\[\]\\.,;:\s#"]+)*)|(".+"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))
https://regex101.com/r/pVV5TI/1

You can try this (same code as yours but improved), but you need to check returned groups after matching because the email is either returned in group 2 or group 3, depending on whether a name is given.
(?:("?(?:.*)"?)\s)?<(.*#.*)>|(.*#.*)

This way you can get with or without name, removing the quotes.
\"*?(([\p{L}0-9-_ ]+)\"?)*?\b\ *<?([a-z0-9-_\.]+#[a-z0-9-_\.]+\.[a-z]+)>?

Although #hpique has a good answer, that solution only works when the name/email string is the only thing being analyzed in Regex. It will not work when you have a longer message that contains other items, such as an email. Also many of the other solutions will fail to match when the person has included a middle name (i.e. James Herbert Bond <jbond#example.com).
Here is a more robust Regex solution I wrote that can pick up the first names, last names, and emails like you wanted, even if there are many other things in the string:
/(?:"?)(\b[A-Z][a-z]+\b ?)(\b[A-Z][a-z]+\b ?)*(?:"?) ?<([a-zA-Z0-9._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)>|([a-zA-Z0-9._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)/g
Check out the above syntax here: Example on Regexr

Related

Regex - Creating validation to enforce that a string has 2+ words

If you have a moment, I need some help adding to my regex expression. I am validating a response in a Google Form for the user's full name.
The validation requires:
That only letters are used
That the user inputs both the first and second name (at a minimum), separated by a space
So far I have come up with:
[a-zA-Z ]+]
But this lacks the check for a minimum of two words in a given string.
After an hour of fails and googling, I have admitted defeat and need your help!
Thanks in advance.

This should do the job:
/^[a-z]{2,}( [a-z]+)*?( [a-z]{2,}){1,}$/i
It matches:
john smith ◄ all lowercase
John Smith
John P E Smith
John Paul E Smith
John Paul Eward Smith
It ignores:
John
John S
John Paul S
John Paul Edward S
J0hn Smith  ◄ zero instead of the letter 'o'
John     Smith  ◄ multiple spaces
You can play with this fiddle.
Best regards

Regex optional capture groups in any order

I would like to capture groups based on a consecutive occurrence of matched groups in any order. And when one set type is repeated without the alternative set type, the alternative set is returned as nil.
I am trying to extract names and emails based on the following regex:
For names, two consecutive capitalized words:
[A-Z][\w]+\s+[A-Z][\w]+
For emails:
\b[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b
Example text:
John Doe john#doe.com random text
Jane Doe random text jane#doe.com
jim#doe.com more random text tim#doe.com Tim Doe
So far I have used non-capture groups and positive look aheads to tackle the "in-no-particular-order-or-even-present" problem but only managed to do so by segmenting by newlines. So my regex looks like this:
^(?=(?:.*([A-Z][\w]+\s+[A-Z][\w]+))?)(?=(?:.*(\b[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b))?).*
And the results miss items where there are multiple contacts on the same line:
[
["John Doe", "john#doe.com"],
["Jane Doe", "jane#doe.com"],
["Tim Doe", "tim#doe.com"],
]
When what I'm looking for is:
[
["John Doe", "john#doe.com"],
["Jane Doe", "jane#doe.com"],
[nil, "jim#doe.com"],
["Tim Doe", "tim#doe.com"],
]
My skills in regex are limited and I started using regex because it seemed like the best tool for matching names and emails.
Is regex the best tool to use for this kind of problem or are there more efficient alternatives using loops if we're extracting hundreds of contacts in this manner?

Your text is already almost too random to make this work. Even more names and emails are very difficult to capture at times. A more advanced email pattern would only help a little.There are not only unusual email addresses there are also all sorts of wild name patterns.
What about D'arcy Bly, Markus-Anthony Reid, Lee Z, and those are probably the simplest examples.
So, you have to make a lot of assumptions and won't be fully satisfied unless you are using more advanced techniques like Natural language processing.
If you insist on your approach, I came up with this (toothless) monstrosity:
([A-Z]\w+ [A-Z]\w+)(?:\w* )*([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})|
([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})(?:\w* )*([A-Z]\w+ [A-Z]\w+)|
([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})
The order of the alternation groups is important to be able to capture the stray email.
Demo
PS: The demo I uses a branch reset to capture only in group 1 and 2. However, it looks like Ruby 2.x does not support branch reset groups. So, you need to check all 5 groups for values.

Here's a rewrite of #wp78de's idea into Ruby regexp syntax:
regexp = /
(?<name>
[A-Z][\w]+\s+[A-Z][\w]+
){0}
(?<email>
\b[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b
){0}
(?:
\g<name> (?:\w*\s)* \g<email>
| \g<email> (?:\w*\s)* \g<name>
| \g<email>
)
/x
text = <<-TEXT
John Doe john#doe.com random text
Jane Doe random text jane#doe.com
jim#doe.com more random text tim#doe.com Tim Doe
TEXT
p text.scan(regexp)
# => [["John Doe", "john#doe.com"],
# => ["Jane Doe", "jane#doe.com"],
# => [nil, "jim#doe.com"],
# => ["Tim Doe", "tim#doe.com"]]

RegEx for extracting multiple words in a passage using Tableau

I have a passage and I need to extract a couple of words from it in tableau. The passage is given below:
This looks like a suspicious account. Please look at the details
below. Name: John Mathew Email:john.mathew#abc.com Phone:+1
111-111-1111 Department: abc
For more enquiries contact: ----
Name, email, phone and the department are in the same line separated by blank spaces. I used the below regex and it works well for the department alone:
regexp_extract([CASE DESCRIPTION],'Department : (.+)')
When I apply this one name, I get:
Name: John Mathew Email:john.mathew#abc.com Phone:+1 111-111-1111
Department: abc
instead of just the name. The same happens with email.
How do I solve this problem?

It looks to me like the issue is that your regex just has '(.+)' as its capture group, which basically means "everything" (after the specified string). Since the fields are all on one line, everything after "name" includes the email, phone, and department. (The regex works with department because it's the last thing on the line.)
So, to make it work right, you need to give your regex something other than the end of the line to stop on. To capture just the name, you need to stop before the Email tag, and so on down the list. Something like
Name = regexp_extract([CASE_DESCRIPTION],'Name: (.+) Email:')
email = regexp_extract([CASE_DESCRIPTION],'Email: (.+) Phone:')
phone = regexp_extract([CASE_DESCRIPTION],'Phone: (.+) Department:')
department = regexp_extract([CASE_DESCRIPTION],'Department: (.+)')

RegEx to match text after line break

I have the following input:
Text1 FirstName LastName (10) Text2
I need to fetch the full name without the parenthesis. For example:
User: John Doe (10) Email:
Result: John Doe
Thanks in advance for the help!

Try using this regex on the line containing the first and last name:
^(.*)\s\(\d+\)$
Regex101

To match just the target you're after, use a look arounds (which don't capture):
^(?<=User: \n).*(?=\s+\(\d+\)\s*$)
The entire match will be "John Doe".
See live demo.

Regex get next 2 words after certain string

I need a regular expression, which can find names in some text content. It should match from 1 to 3 names, First-name, (Middle-name), (Surname).
I have a list of valid first-names which will be used to search the text. If the first-name is found in the text, the regular expression should get the next middle-name or/and surname, if they exists.
As an example the names below, should be valid names found:
John
John Doe
John Average Joe
Special cases:
John average Doe (if, possible it should match/find John Doe)
So far my solution is:
\b(John|Mary|Tom)\b(?:(?:([^A-Za-z]*[A-Z][^\s,]*)*[^A-Za-z]+)){0,3}
This kinda works, the problem is the limitation to only match maximum 3 words, which this doesn't.
Online test: http://regex101.com/r/aM7bS3/2

I've modified your regex HERE
You can use the following:
\b(Mogens|Victor|John)(\b\s*([A-Z]\w+)){0,2}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract email and name with regex - regex

The following regex appears to work on all inputs and uses only two capturing groups: (?:"?([^"]*)"?\s)?(?:<?(.+#[^>]+)>?) http://regex101.com/r/dR8hL3 Thanks to #RohitJain and #burning_LEGION for introducing the idea of non-capturing groups and character exclusion respectively.

use this regex "?([^"])"?\s([^\s]+#.+) group 1 contains name group 2 contains email

(([^<>()\[\]\\.,;:\s#"]+(\.[^<>()\[\]\\.,;:\s#"]+)*)|(".+"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,})) https://regex101.com/r/pVV5TI/1

You can try this (same code as yours but improved), but you need to check returned groups after matching because the email is either returned in group 2 or group 3, depending on whether a name is given. (?:("?(?:.)"?)\s)?<(.#.)>|(.#.*)

This way you can get with or without name, removing the quotes. \"?(([\p{L}0-9-_ ]+)\"?)?\b\ *<?([a-z0-9-_\.]+#[a-z0-9-_\.]+\.[a-z]+)>?

Related

Regex - Creating validation to enforce that a string has 2+ words

Regex optional capture groups in any order

RegEx for extracting multiple words in a passage using Tableau

RegEx to match text after line break

Regex get next 2 words after certain string

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract email and name with regex - regex

The following regex appears to work on all inputs and uses only two capturing groups: (?:"?([^"]*)"?\s)?(?:<?(.+#[^>]+)>?) http://regex101.com/r/dR8hL3 Thanks to #RohitJain and #burning_LEGION for introducing the idea of non-capturing groups and character exclusion respectively.

use this regex "?([^"]*)"?\s*([^\s]+#.+) group 1 contains name group 2 contains email

(([^<>()\[\]\\.,;:\s#"]+(\.[^<>()\[\]\\.,;:\s#"]+)*)|(".+"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,})) https://regex101.com/r/pVV5TI/1

You can try this (same code as yours but improved), but you need to check returned groups after matching because the email is either returned in group 2 or group 3, depending on whether a name is given. (?:("?(?:.*)"?)\s)?<(.*#.*)>|(.*#.*)

This way you can get with or without name, removing the quotes. \"*?(([\p{L}0-9-_ ]+)\"?)*?\b\ *<?([a-z0-9-_\.]+#[a-z0-9-_\.]+\.[a-z]+)>?

Related

Regex - Creating validation to enforce that a string has 2+ words

Regex optional capture groups in any order

RegEx for extracting multiple words in a passage using Tableau

RegEx to match text after line break

Regex get next 2 words after certain string

Categories

Resources

use this regex "?([^"])"?\s([^\s]+#.+) group 1 contains name group 2 contains email

You can try this (same code as yours but improved), but you need to check returned groups after matching because the email is either returned in group 2 or group 3, depending on whether a name is given. (?:("?(?:.)"?)\s)?<(.#.)>|(.#.*)

This way you can get with or without name, removing the quotes. \"?(([\p{L}0-9-_ ]+)\"?)?\b\ *<?([a-z0-9-_\.]+#[a-z0-9-_\.]+\.[a-z]+)>?