Multiple filter regex - regex

Sample Data:
ID Name User
12 Test Same
14 Xyz Joe
15 Abc John
16 Def Bill
17 Ghi Donald
If a user searches for Abc or Joe, he should get that rows.
Regex:
'Abc|Joe'
Output:
14 Xyz Joe
15 Abc John
Now, if the user further searches for e, it should filter based on the previous output(2 rows retrieved), so I will just get 14 Xyz Joe . Is this possible using regex?
I am trying to have all this in one regex.
`'Abc|Joe and the second filter goes here (All in one regex)'`
Use case: The user selects checkboxes to set the filters he wants to apply on the data (All the data in the columns Name and User are available). He may then search again on the filtered result using a search textbox.

((firstRegex)(?:.*(secondRegex)))|((secondRegex)(?:.*(firstRegex)))
((Abc|Xyz)(?:.*(Jo)))|((Jo)(?:.*(Abc|Xyz)))
See Demo
we don't know which regex would before or after,so it have two case and we use | combine these case.If have more search,suggest you write some code.

For the 2 filters:
/^\d+\s+(?:Abc|Xyz|Def)\s+\S*(?:Jo|ill).*/mg;
If the user doesn't specify the second filter, you could just leave it empty as (?:).
I'm positive you could create these kind of expressions if you read a couple of minutes about regex syntax, so allow me to recommend:
Regular Expressions Tutorial (regular-expressions.info). A quite comprehensive tutorial to learn regex.
regex101.com. Allows you to test different expressions and understand the way a pattern matches the subjet string.

Related

Validate authors in google sheets using regex

I have an authors column and I would like to limit the input to a specific format using data validation and REGEXMATCH.
Let's say we have 3 authors (of course the validation should allow for 1 or more authors). In no particular order:
John Edward Smith
Jane Doe
José Luis-Visquez
The desired format is strictly this (including upper and lower case and punctuation):
Smith JE, Doe J, Luis-Visquez J
Anything else should throw an error.
No dot at the end
I tried this regex but it is matching incorrect inputs as well:
(?:(?:[A-Z][a-z]+\-?(?:[A-Z][a-z]+)?)\s[A-Z]{1,2}, )*(?:(?:[A-Z][a-z]+\-?(?:[A-Z][a-z]+)?)\s[A-Z]{1,2})
What is the correct regex that would allow for unlimited authors in this specific format in no particular order for the author names? The regex should be general to any name.
try:
=ARRAYFORMULA(REGEXMATCH(B2:B4, "^\w+(?:-\w+)? [A-Z]{1,2}$"))
or more strict:
=ARRAYFORMULA(REGEXMATCH(B2:B4, "^[A-Z][a-z]+(?:-[A-Z][a-z]+)? [A-Z]{1,2}$"))

Extract data from dataset

I need to extract title from name but cannot understand how it is working . I have provided the code below :
combine = [traindata , testdata]
for dataset in combine:
dataset["title"] = dataset["Name"].str.extract(' ([A-Za-z]+)\.' , expand = False )
There is no error but i need to understand the working of above code
Name
Braund, Mr. Owen Harris
Cumings, Mrs. John Bradley (Florence Briggs Thayer)
Heikkinen, Miss. Laina
Futrelle, Mrs. Jacques Heath (Lily May Peel)
Allen, Mr. William Henry
Moran, Mr. James
above is the name feature from csv file and in dataset["title"] it stores the title of each name that is mr , miss , master , etc
Your code extracts the title from name using pandas.Series.str.extract function which uses regex
pandas.series.str.extract - Extract capture groups in the regex pat as columns in a DataFrame.
' ([A-Za-z]+)\.' this is a regex pattern in your code which finds the part of string that is here Name wherever a . is present.
[A-Za-z] - this part of pattern looks for charaters between alphabetic range of a-z and A-Z
+ it states that there can be more than one character
\. looks for following . after a part of string
An example is provided on the link above where it extracts a part from
string and puts the parts in seprate columns
I found this specific response with the link very helpful on how to use the 'str's extract method and put the strings in columns and series with changing the expand's value from True to False.

Convert MS Outlook formatted email addresses to names of attendees using RegEx

I'm trying to use Notepadd ++ to find and replace regex to extract names from MS Outlook formatted meeting attendee details.
I copy and pasted the attendee details and got names like.
Fred Jones <Fred.Jones#example.org.au>; Bob Smith <Bob.Smith#example.org.au>; Jill Hartmann <Jill.Hartmann#example.org.au>;
I'm trying to wind up with
Fred Jones; Bob Smith; Jill Hartmann;
I've tried a number of permutations of
\B<.*>; \B
on Regex 101.
Regex is greedy, <.*> matches from the first < to the last > in one fell swoop. You want to say "any character which is neither of these" instead of just "any character".
*<[^<>]*>
The single space and asterisk before the main expression consumes any spaces before the match. Replace these matches with nothing and you will be left with just the names, like in your example.
This is a very common FAQ.

Regex get next 2 words after certain string

I need a regular expression, which can find names in some text content. It should match from 1 to 3 names, First-name, (Middle-name), (Surname).
I have a list of valid first-names which will be used to search the text. If the first-name is found in the text, the regular expression should get the next middle-name or/and surname, if they exists.
As an example the names below, should be valid names found:
John
John Doe
John Average Joe
Special cases:
John average Doe (if, possible it should match/find John Doe)
So far my solution is:
\b(John|Mary|Tom)\b(?:(?:([^A-Za-z]*[A-Z][^\s,]*)*[^A-Za-z]+)){0,3}
This kinda works, the problem is the limitation to only match maximum 3 words, which this doesn't.
Online test: http://regex101.com/r/aM7bS3/2
I've modified your regex HERE
You can use the following:
\b(Mogens|Victor|John)(\b\s*([A-Z]\w+)){0,2}

Regex find Proper Nouns or Phrases that are NOT first word in a sentence

I've found several questions that touch on this, but none that seem to answer it. I am trying to build a Regex that will allow me to identify Proper Nouns in a group of text.
I am defining a Proper Noun as follows: A word or group of words that begin with a capital letter, are longer than 1 digit (to exclude things like I, A, etc), and are NOT the first word of a new sentence.
So, in the following text
"Susan Dow stayed at the Holiday Inn on Thursday. She met Tom and Shirley Temple at the bar where they ordered Green Eggs and Ham"
I would want the following returned
Holiday Inn
Thursday
Tom
Shirley Temple
Green Eggs
Ham
Right now, [A-Z]{1,1}[a-z]*([\s][A-Z]{1,1}[a-z]*)* is what I have, but it's returning Susan Dow and She in addition to the ones listed above. How can I get my . look-up to work?
You can use:
(?<!^|\. |\. )[A-Z][a-z]+
per this rubular
Update: Integrated the two negative looks using alternation. Also added check for two spaces between sentences. Note that repetition operators cannot be used in negative lookbehinds per notes in http://www.regular-expressions.info/lookaround.html