Regex to find the last word in string -Javascript flavor - regex

I am close but no quite there. I am trying to match the last word to pull out the last name.
My Regex:
Insured Name:\W*(?<insured_last_name>.*)
Text that I am searching:
Insured Name:
FRED & ETHYL MERTZ
Sample here...
https://regex101.com/r/McdMcq/3

You can match Insured Name: until the end of the line. Then match a newline and optional following whitespace chars.
Then at the line where you want to get the last word, first match until the end of the line, then backtrack until the last space, and capture 1+ non whitespace chars in group insured_last_name
\bInsured Name:.*\r?\n\s*.* (?<insured_last_name>\S+)
In parts
\bInsured Name: Match literally
.*\r?\n\s* Match the rest of the line, a newline and 0+ whitespace chars
.* Match the rest of the line and match the last space
(?<insured_last_name>\S+) Match 1+ non whitespace chars in group insured_last_name
Regex demo

You can simply /\w+$/gm
Demo: https://regex101.com/r/McdMcq/4
Explanation:
\w: Look for alphanumeric letters
+: At least one
$: And then the end of the string
If there are multiple rows and potentially garbage data in between I would recommend you to remove the 2 newlines (\n\n) and then do a Positive Lookbehind looking for "Name". Demo: https://regex101.com/r/McdMcq/5
If you need to store the result in a capture group simply enclose \w+$ with parenthesis and group name (i.e (?<insured_last_name>\w+$)) on any of the two regexes.

You may need to define your data set a little more, but you can try
Insured Name:\n+.*(?<insured_last_name>\b.+)
Example
It starts at "Insured Name:", then any empty lines, then will read the following line until the final word boundary (excluding the EOL); anything after that is in your named group.

Related

Matching the second to the last word

I have the following line:
19 crooks highway Ontario
I want to match anything that comes before the last space (whitespace). So, in this case, I want to get back
highway
I tried this:
([^\s]+)
You may use a lookahead to do it:
\S+(?=\s\S*$)
\S+ any non-whitespace characters, repeat 1+ times (equivolent to [^\s]+)
(?=...) lookahead, the string after the previous match must satifies the condition inside
\s\S*$ any non-whitespace characters preceded by a whitespace, then the end of the string
See the test case
Use this:
\s+(\S+)\s+\S*$
Use $ anchor to match the end of the line
\s+\S*$ matches one or more spaces followed by zero or more non-whitespaces at the end of the line
The desired result is captured in the capturing group
Demo

Regular expression that matches at least 4 words starting with the same letter?

I've been trying to solve this problems for few hours but with no luck. The task is to write a regular expression that matches at least four words starting with the same letter. But! These words do not have to be one after another.
This regex should be able to match a line like this:
cat color coral chat
but also one like this:
cat take boom candle creepy drum cheek
Thank you!
So far I have got this regex but it only matches words when they are in order.
(\w)\w+\s+\1\w+\s+\1\w+\s+\1
If you have only words in the line that can be matched with \w:
\b(\w)\w*(?:(?:\s+\w+)*?\s+\1\w*){3}
Explanation
\b A word boundary to prevent a partial word match
(\w)\w* Capture a single word character in group 1 followed by matching optional word characters
(?: Non capture group to repeat as a whole part
(?:\s+\w+)*? Match 1+ whitespace chars and 1+ word chars in between in case the word does not start with the character captured in the back reference
\s+\1\w* Match 1+ whitespace chars, a backreference to the same captured character and optional word characters
){3} Close the non capture group and repeat 3 times
See a regex demo
Note that \s can also match a newline.
If the words that should with the same character should be at least 2 characters long (as (\w)\w+ matches 2 or more characters)
\b(\w)\w+(?:(?:\s+\w+)*?\s+\1\w+){3}
See another regex demo.
Another idea to match lines with at least 4 words starting with the same letter:
\b(\w)(?:.*?\b\1){3}
See this demo at regex101
This is not very accurate, it just checks if there are three \b word boundaries, each followed by \1 in the first group \b(\w) captured character to the right with .*? any characters in between.

Regular expression matching and remove spaces

Please how can I get the address using regex:
Address 123 Mayor Street, LAG Branch ABC
used (?<=Address(\s))(.*(?=\s)) but it includes the spaces after "Address". Trying to get an expression that extracts the address without the spaces. (There are a couple of spaces after "Address" before "123")
Thanks!
The pattern (?<=Address(\s))(.*(?=\s)) that you tried asserts Address followed by a single whitespace char to the left, and then matches the rest of the line asserting a whitespace char to the right.
For the example data, that will match right before the last whitespace char in the string, and the match will also contain all the whitespace chars that are present right after Address
One option to match the bold parts in the question is to use a capture group.
\bAddress\s+([^,]+,\s*\S+)
The pattern matches:
\bAddress\s+ Match Address followed by 1+ whitespace chars
( Capture group 1
[^,]+, Match 1+ occurrences of any char except , and then match ,
\s*\S+ Match optional whitespace chars followed by 1+ non whitespace chars
) Close group 1
.NET regex demo (Click on the Table tab to see the value for group 1)
Note that \s and [^,] can also match a newline
A variant with a positive lookbehind to get a match only:
(?<=\bAddress\s+)[^,\s][^,]+,\s*\S+
.NET Regex demo

Regex to pull first two fields from a comma separated file

I want to pull the second string in a commma delimited list where the first value is numeric and the second is alpha.
I'm using \d[^,]+(?=,) to pull the numeric value in the first field and just need help with pulling the second value from the "Name" column.
Here's part of a sample file that I'm trying to extract data from:
Address Number,Name,Employee Master Exist(Y/N),Auto-Deposit Exists(Y/N),Supplier Master Exists(Y/N),Supplier Master Created,ACH Account Exists(Y/N),ACH Account Created,ACH Same as Auto-deposit(Y/N)
//line break here is for clarity and does not exist in file//
4398,Presley Elvis Aaron,Y,N,Y,N,Y,N,N
10154,Shepard Alan Barrett,Y,Y,Y,N,Y,N,N
You could make use of a capturing group if you want to match the second string by first matching 1+ digits and a comma.
Then capture in a group matching 1+ chars a-zA-Z and match the trailing comma.
^\d+,([a-zA-Z]+(?: [a-zA-Z]+)*),
^ Start of string
\d+, Match 1+ digits and a comma (Or use (\d+), if the digits should also be a group)
( Capture group 1
[a-zA-Z]+ Match 1+ chars a-zA-Z
(?: [a-zA-Z]+)* Repeat matching the same as previous preceded by a space
), Close capturing group and match trailing comma
Regex demo
To get a bit broader match you could use this pattern to match at least a single char a-zA-Z
\d+,([a-zA-Z ]*[a-zA-Z][a-zA-Z ]*),
Regex demo
Note that this part in your pattern \d[^,]+ matches not only digits, but 1 digit followed by 1+ times any char except a comma which would for example also match 4a$ .
You could try this regex:
^\d+,([^,]+),
This will look for lines:
starting with one or more digits
followed by a comma
capture anything that is not a comma
followed by a comma
See it at Regex 101
If not all lines contain a name, then change the + to a *:
^\d+,([^,]*),
See alternative regex

Regex Extract a string between two words containing a particular string

I have the below string
abc-12d-ef-oy-5678-xyz--**--20190120075439322am--**--ghi-66d-ef-oy-8877-sdf--**--sfdfdsgfg--**--20190120075765487am
It is kind of multi character delimited string, delimited by '--**--' I am trying to extract the first and second words which has the -oy- tag in it. This is a column in a table. I am using the regex_extract method but i am not able extract the string which contains a string and ends with a string.
Here is one pattern that i tried .*(.*oy.*)--
If the -oy- can not be at the start or at the end, you could use this pattern to match the 2 hyphen delimited strings with -oy-:
[a-z0-9]+(?:-[a-z0-9]+)*-oy(?:-[a-z0-9]+)+
Regex details
[a-z0-9]+ Match 1+ times a-z0-9
(?: Non capturing group
-[a-z0-9]+ Match - and 1+ times a-z0-9
)* Close group and repeat 0+ times
-oy Match literally
(?:-[a-z0-9]+)+ Repeat 1+ times a group which will match - and 1+ times a-z0-9
You can extend the character class [A-Za-z0-9] to allow what you want to match like uppercase chars.
Regex demo | Java demo
If the matches should be between delimiters, you could use a positive lookbehind and positive lookahead and an alternation:
(?<=^|--\\*\\*--)[a-z0-9]+(?:-[a-z0-9]+)*-oy(?:-[a-z0-9]+)+(?=--\\*\\*--|$)
See a Java demo
You can use this regex which will match string containing -oy- and capture them in group1 and group2.
^.*?(\w+(?:-\w+)*-oy-\w+(?:-\w+)*).*?(\w+(?:-\w+)*-oy-\w+(?:-\w+)*)
This regex basically matches two strings delimiter separated containing -oy- using this (\w+(?:-\w+)*-oy-\w+(?:-\w+)*) to capture the text.
Demo
Are you able to select values from capture groups?
(?:--\*\*--|^)(.*?-oy-.*?)(?:--\*\*--|$)
?: - Non-capture group, matches the delimiter, begin of line, or end of line but does not create a capture group
*? - Lazy match so you only grab the contents of the field
https://regex101.com/r/aUAvcx/1
--- Second stab at this follows ---
This is convoluted. Hopefully you can use Lookahead and Lookbehind. The last problem I had was the final record was being "Greedy" and sucking up the field before it too. So I had to add an exclusion in the capture group for your delimiter.
See if this works for you.
(?<=--\*\*--|^)((?:(?:(?!--\*\*--).)*)-oy-(?:(?:(?!--\*\*--).)*))(?=--\*\*--|$)
https://regex101.com/r/aUAvcx/3
Basically the (?: are so we are not getting too many capture groups to work with.
There are three parts to this:
The lookbehind - Make sure the field is framed by the delimiter (or start of line)
The capture group - Grab the contents of the field, making sure a delimiter isn't sucked up into it
The lookahead - Make sure the field is framed by the delimiter (or end of line)
As far as the capture group goes, I check the left and right side of the -oy- to make sure the delimiter isn't there.