Regex that captures the string "\s # \s" and only that - regex

So I am reading in a string and it is always split by x # y, with x or y being its own string such as "John Doe" and "Jane Doe". My regex currently gets the string "John Doe " and " Jane Doe". I want the line to be split on the white space with the # symbol. Does anyone know a regex for that?

Given this string: john doe # jane doe you can use this regex (.*)\s#\s(.*$) and you will have john doe and jane doe as your two capture groups.

the regex was this (\s\#\s). it worked

Related

Regex: Separated by nested parentheses and semicolon

My strings look like the following (each row is one exemplrary string):
Smith, Anna (Univ Cambridge); Doe, Jane (Univ Vienna (Austria)); Doe, John (Univ Tokyo; MIT)
Mueller, Hans (FU Berlin (Germany)); Schmid, Julia (); Doe, John (CalTech); Boe, Jane (TU Wien)
Kim, Lee (Nazarbayev Univ (Kazakhstan); Univ Oxford)
In other words, the pattern comprises Surname, Name (Affiliation); (or without the ; if no other person follows), whereby the parentheses may be optionally nested ( () ) or contain a ; or be empty ().
I want to extract each name and affiliation, as in:
Smith, Anna (Univ Cambridge)
Doe, Jane (Univ Vienna (Austria))
Doe, John (Univ Tokyo; MIT)
Mueller, Hans (FU Berlin (Germany))
Schmid, Julia ()
Doe, John (CalTech)
Boe, Jane (TU Wien)
Kim, Lee (Nazarbayev Univ (Kazakhstan); Univ Oxford)
What would be the correct RegEx to do this?
My attempt with (?<=\()(?:[^()]+|\([^)]+\))+ did not work well...
Since your expected matches can only have one nested parentheses level, you can use
\w+,\s*\w+\s*\([^()]*(?:\([^()]*\)[^()]*)*\);?
See the regex demo.
Depending on whether or not your regex library supports recursion, or balanced constructs, this can be further enhanced to match parenthetical phrases of any depth.
Details:
\w+ - one or more word chars
, - a comma
\s* - zero or more whitespaces
\w+\s* - one or more word and then zero or more whitespace chars
\( - a ( char
[^()]* - zero or more chars other than ( and )
(?:\([^()]*\)[^()]*)* - zero or more sequences of (...) substrings with no ( and ) in between and then zero or more chars other than ( and )
\);? - a ) and then an optional ;.

RegEx to match text after line break

I have the following input:
Text1 FirstName LastName (10) Text2
I need to fetch the full name without the parenthesis. For example:
User: John Doe (10) Email:
Result: John Doe
Thanks in advance for the help!
Try using this regex on the line containing the first and last name:
^(.*)\s\(\d+\)$
Regex101
To match just the target you're after, use a look arounds (which don't capture):
^(?<=User: \n).*(?=\s+\(\d+\)\s*$)
The entire match will be "John Doe".
See live demo.

Regex to match a few possible strings with possible leading and/or trailing spaces

Let's say I have a string:
John Smith (auth.), Mary Smith, Richard Smith (eds.), Richie Jack (ed.), Jack Johnny (eds.)
I would like to match:
John Smith(auth.),Mary Smith,Richard Smith(eds.),Richie Jack(ed.),Jack Johnny(eds.)
I have came up with a regex but I have a problem with the | (or character) because my string contains characters that have to be escaped like ().. This is what I'm not able deal with. My regex is:
\s+\((auth\.\)|\(eds\.\))?,\s+
EDIT: I think now that the most universal solution would be to assume that in () could be anything.
Try this:
\s*\((auth|eds?)?\.\)?,?\s*
\s+ means one or more
\s* means zero or more
Based on your comment, I modified the regex:
\s*((\([^)]*\))|,)\s*

Regex to transpose somewhat tricky last name, first name , title

Is it possible to use one regex to convert both
Doe, John C., Jr., M.D.
Doe, Jane, M.D.
to read
John C. Doe Jr., M.D.
Jane Doe, M.D.
Replace
^([^,]+),\s([^,]+),(?:(\s[^,]+),)?\s([^,]+)$
with
$2 $1$3, $4
DEMO
Barmar's answer works for the specified examples, but there's a possibly simpler solution which should satisfy our input:
Replace ^([^,]+),\s([^,]+),(.*)$ with $2 $1$3
We replace the (?:(\s[^,]+),)?\s([^,]+) with a simpler (.*) that grabs all titles after the first name (we don't care about the specifics of what's in these titles).

Extract email and name with regex

What would be the regular expressions to extract the name and email from strings like these?
johndoe#example.com
John <johndoe#example.com>
John Doe <johndoe#example.com>
"John Doe" <johndoe#example.com>
It can be assumed that the email is valid. The name will be separated by the email by a single space, and might be quoted.
The expected results are:
johndoe#example.com
Name: nil
Email: johndoe#example.com
John <johndoe#example.com>
Name: John
Email: johndoe#example.com
John Doe <johndoe#example.com>
Name: John Doe
Email: johndoe#example.com
"John Doe" <johndoe#example.com>
Name: John Doe
Email: johndoe#example.com
This is my progress so far:
(("?(.*)"?)\s)?(<?(.*#.*)>?)
(which can be tested here: http://regexr.com/?337i5)
The following regex appears to work on all inputs and uses only two capturing groups:
(?:"?([^"]*)"?\s)?(?:<?(.+#[^>]+)>?)
http://regex101.com/r/dR8hL3
Thanks to #RohitJain and #burning_LEGION for introducing the idea of non-capturing groups and character exclusion respectively.
use this regex "?([^"]*)"?\s*([^\s]+#.+)
group 1 contains name
group 2 contains email
(([^<>()\[\]\\.,;:\s#"]+(\.[^<>()\[\]\\.,;:\s#"]+)*)|(".+"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))
https://regex101.com/r/pVV5TI/1
You can try this (same code as yours but improved), but you need to check returned groups after matching because the email is either returned in group 2 or group 3, depending on whether a name is given.
(?:("?(?:.*)"?)\s)?<(.*#.*)>|(.*#.*)
This way you can get with or without name, removing the quotes.
\"*?(([\p{L}0-9-_ ]+)\"?)*?\b\ *<?([a-z0-9-_\.]+#[a-z0-9-_\.]+\.[a-z]+)>?
Although #hpique has a good answer, that solution only works when the name/email string is the only thing being analyzed in Regex. It will not work when you have a longer message that contains other items, such as an email. Also many of the other solutions will fail to match when the person has included a middle name (i.e. James Herbert Bond <jbond#example.com).
Here is a more robust Regex solution I wrote that can pick up the first names, last names, and emails like you wanted, even if there are many other things in the string:
/(?:"?)(\b[A-Z][a-z]+\b ?)(\b[A-Z][a-z]+\b ?)*(?:"?) ?<([a-zA-Z0-9._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)>|([a-zA-Z0-9._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)/g
Check out the above syntax here: Example on Regexr