Matching the second to the last word - regex

I have the following line:
19 crooks highway Ontario
I want to match anything that comes before the last space (whitespace). So, in this case, I want to get back
highway
I tried this:
([^\s]+)

You may use a lookahead to do it:
\S+(?=\s\S*$)
\S+ any non-whitespace characters, repeat 1+ times (equivolent to [^\s]+)
(?=...) lookahead, the string after the previous match must satifies the condition inside
\s\S*$ any non-whitespace characters preceded by a whitespace, then the end of the string
See the test case

Use this:
\s+(\S+)\s+\S*$
Use $ anchor to match the end of the line
\s+\S*$ matches one or more spaces followed by zero or more non-whitespaces at the end of the line
The desired result is captured in the capturing group
Demo

Related

Regex match last word in string ending in

I want to regex match the last word in a string where the string ends in ... The match should be the word preceding the ...
Example: "Do not match this. This sentence ends in the last word..."
The match would be word. This gets close: \b\s+([^.]*). However, I don't know how to make it work with only matching ... at the end.
This should NOT match: "Do not match this. This sentence ends in the last word."
If you use \s+ it means there must be at least a single whitespace char preceding so in that case it will not match word... only.
If you want to use the negated character class, you could also use
([^\s.]+)\.{3}$
( Capture group 1
[^\s.]+ Match 1+ times any char except a whitespace char or dot
) Close group
\.{3} Match 3 dots
$ End of string
Regex demo
You can anchor your regex to the end with $. To match a literal period you will need to escape it as it otherwise is a meta-character:
(\S+)\.\.\.$
\S matches everything everything but space-like characters, it depends on your regex flavor what it exactly matches, but usually it excludes spaces, tabs, newlines and a set of unicode spaces.
You can play around with it here:
https://regex101.com/r/xKOYa4/1

Regex to find the last word in string -Javascript flavor

I am close but no quite there. I am trying to match the last word to pull out the last name.
My Regex:
Insured Name:\W*(?<insured_last_name>.*)
Text that I am searching:
Insured Name:
FRED & ETHYL MERTZ
Sample here...
https://regex101.com/r/McdMcq/3
You can match Insured Name: until the end of the line. Then match a newline and optional following whitespace chars.
Then at the line where you want to get the last word, first match until the end of the line, then backtrack until the last space, and capture 1+ non whitespace chars in group insured_last_name
\bInsured Name:.*\r?\n\s*.* (?<insured_last_name>\S+)
In parts
\bInsured Name: Match literally
.*\r?\n\s* Match the rest of the line, a newline and 0+ whitespace chars
.* Match the rest of the line and match the last space
(?<insured_last_name>\S+) Match 1+ non whitespace chars in group insured_last_name
Regex demo
You can simply /\w+$/gm
Demo: https://regex101.com/r/McdMcq/4
Explanation:
\w: Look for alphanumeric letters
+: At least one
$: And then the end of the string
If there are multiple rows and potentially garbage data in between I would recommend you to remove the 2 newlines (\n\n) and then do a Positive Lookbehind looking for "Name". Demo: https://regex101.com/r/McdMcq/5
If you need to store the result in a capture group simply enclose \w+$ with parenthesis and group name (i.e (?<insured_last_name>\w+$)) on any of the two regexes.
You may need to define your data set a little more, but you can try
Insured Name:\n+.*(?<insured_last_name>\b.+)
Example
It starts at "Insured Name:", then any empty lines, then will read the following line until the final word boundary (excluding the EOL); anything after that is in your named group.

Match all instances of a certain character inside every word preceded by a certain word and not delimited by a space

Given a string such as below:
word.hi. bla. word.
I want to construct a regex which will match all "."s preceded by "word" and any other non space character
So, in the above example I would want the the first, second and last dots to be matched.
While matching the first and last dots would be easy with global flag (/(?:word.*)\K./gU), I'm not sure how to construct a regex that would also match the second dot.
Appreciate any pointers.
You might match word and then get all consecutive matches using the \G anchor excluding matching whitespace chars or a dot.
(?:\bword|\G(?!\A))[^.\s]*\K\.
In parts
(?: Non capture group
\bword Match word preceded by a word boundary
| Or
\G(?!\A) Assert the position at the end of the previous match, not at the start
) Close non capture group
[^.\s]* Match 0+ occurrences of any char except . or a whitespace char
\K Clear the match buffer (forget what is matched until now)
\. Match a dot
Regex demo

How to capture everything until another capture group

I have the following template :
1251 Left Random Text I want to fill
It can go through multiple lines
As you can see
9841 Right Again we see a lot of random text with 3115 numbers
And this also goes
To multiple lines
0121 Right
5151 Right This one is just one line
I was wrong
9731 Left This one is just a line
5123 NA Instruction 5151 was wrong
4113 Right Instr 9841 was correct
We checked
I want to have 3 groups:
1251
Left
Random Text I want to fill
It can go through multiple lines
As you can see
I'm using
(\d+)\s(\w+)\s(.*)
but it stops at the current line only (so I get only Random Text I want to fill in group 3, although I want including As you can see)
If I'm using Single line flag I get only 1 match for each group, group 3 almost being all
Here is live : https://regex101.com/r/W3x0mH/4
You could use a repeating group matching all the lines while asserting that the next line does not start wit 1+ digits followed by Left or Right:
(\d+)\s(\w+)\s(.*(?:\r?\n(?!\d).*)*)
Explanation
(\d+)\s(\w+)\s Match the first 2 groups
(Third capturing group
.* Match 0+ times any char except a newline
(?: Non capturing group
\r?\n(?!\d).* Match newline, assert what is on the right is not a digit
)* Close non capturing group and repeat 0+ times
) Close capturing group
Regex demo
You may use this regex with a lookahead:
^(\d+)\s(\w+)\s(.*?)(?=\n\d|\z)
with DOTALL and MULTILINE modifiers.
Updated Regex Demo
RegEx Details:
^: Line start
(\d+): Match and capture 1+ digits in group #1
\s: match a whitespace
(\w+): Match and capture 1+ word characters in group #2
\s: match a whitespace
(.*?): Match 0 or more of any character (non-greedy) provided next lookahead assertion is satiSfied
(?=\n\d|\z): Lookahead assertion to assert that we have a newline followed by a digit or there is end of input
Faster Regex:
If you are using this regex on a long string then you should also keep overall performance in mind as a regex with DOTALL modifier will tend to get slow on a large size text. For that I suggest using this regex that doesn't need DOTALL modifier:
^(\d+)\s(\w+)\s(.*(?:\n.*)*?)(?=\n\d|\z)
RegEx Demo 2
On regex101 demo this regex takes just 181 steps as compared to first one that takes 1300 steps.
For the third group, repeat any character while using negative lookahead for ^\d, which would indicate the start of a new match:
(\d+)\s(\w+)\s((?:(?!^\d)[\s\S])*)
https://regex101.com/r/W3x0mH/5
You may try with this regex:
^(\d+)\s+(\w+)\s+(.*?)(?=^\d|\z)
^(\d+)\s+ , ^\d+ Line begins with numbers followed by one or more whitespace character \s+
(\w+)\s+ where \w+ one or more characters (left,right,na or something else) followed by one or more whitespace \w+
(.*?) matches everything until it finds a line beginning with number or \z end of string.
I think it fits your requirement....
Regex101

Fetch Nth field from end of a string

I want to write a regex to pull the nth field from the end of a string in splunk. Please let me know how to proceed.
Code
See regex in use here
(\S+)(?:\s+\S+){4}\s*$
Alternatively, you can also use the following:
See regex in use here
\S+(?=(?:\s+\S+){4}\s*$)
Explanation
Both methods use the same logic. The only difference is the first method captures the 5th element from the end of the string and matches the rest of the string while the second method matches the 5th element from the end of the string and ensures what follows the 5th element matches the pattern.
\S+ Match any non-whitespace character one or more times
(?:\s+\S+){4} Match the following exactly 4 times
\s+ Match any whitespace characters one or more times
\S+ Match any non-whitespace characters one or more times
\s* Match any whitespace characters any number of times
$ Assert position at the end of the line
Further explanations:
(\S+) Captures any non-whitespace character one or more times into a capture group
(?= ... ) where ... represents some pattern (in our case (?:\s+\S+){4}\s*$). This is a positive lookahead that ensures what follows matches without consuming any characters.