regexp print line by line and remove last word - regex

I am trying to remove last word from each line if line contains more than one word.
If line has only one word then print it as it, no need to delete it.
say below are the lines
address 34 address
value 1 value
valuedescription
size 4 size
from above lines I want to remove all last words from each line except from 3rd line as it has only one word using regexp ..
I tried below regexp and it is removing single word lines also
$_ =~ s/\s*\S+\s*+$//;
Need your help for the same.

You can use:
$_ =~ s/(?<=\w)\h+\w+$//m;
RegEx Demo
Explanation:
(?<=\w): Lookbehind to assert that we have at least one word char before last word
\h+: Match 1+ horizontal whitespaces
\w+: match a word with 1+ word characters
$: End of line

Try this regex:
^(?=(?:\w+ \w+)).*\K\b\w+
Replace each match with a blank string
Click for Demo
OR
^((?=(?:\w+ \w+)).*\b)\w+
and replace each match with \1
Click for Demo
Explanation(1st Regex):
^ - asserts the start of the line
(?=(?:\w+ \w+)) - positive lookahead to check if the string has 2 words present in it
.* - If the above condition satisfies, then match 0+ occurrences of any character(except newline) until the end of the line
\K - forget everything matched so far
\b - backtrack to find the last word boundary
\w+ - matches the last word

a single word with no whitespace matches your regex since you've used \s* both before and after the \S+, and \s* matches an empty string.
You could use $_ =~ s/^(.*\S)\s+(\S+)$/$1/;
[Explanation: Match the RegEx if the line contains some number of characters ending with a non-whitespace (stored in $1), followed by 1 or more white-space characters, followed by 1 or more non-white-space characters. If there is a match, replace it all with the first part ($1).]
Though you might want to trim leading/trailing whitespace if you think it might contain any - depends on what you want to happen in those cases.

Related

Regex match last word in string ending in

I want to regex match the last word in a string where the string ends in ... The match should be the word preceding the ...
Example: "Do not match this. This sentence ends in the last word..."
The match would be word. This gets close: \b\s+([^.]*). However, I don't know how to make it work with only matching ... at the end.
This should NOT match: "Do not match this. This sentence ends in the last word."
If you use \s+ it means there must be at least a single whitespace char preceding so in that case it will not match word... only.
If you want to use the negated character class, you could also use
([^\s.]+)\.{3}$
( Capture group 1
[^\s.]+ Match 1+ times any char except a whitespace char or dot
) Close group
\.{3} Match 3 dots
$ End of string
Regex demo
You can anchor your regex to the end with $. To match a literal period you will need to escape it as it otherwise is a meta-character:
(\S+)\.\.\.$
\S matches everything everything but space-like characters, it depends on your regex flavor what it exactly matches, but usually it excludes spaces, tabs, newlines and a set of unicode spaces.
You can play around with it here:
https://regex101.com/r/xKOYa4/1

Regex to find the last word in string -Javascript flavor

I am close but no quite there. I am trying to match the last word to pull out the last name.
My Regex:
Insured Name:\W*(?<insured_last_name>.*)
Text that I am searching:
Insured Name:
FRED & ETHYL MERTZ
Sample here...
https://regex101.com/r/McdMcq/3
You can match Insured Name: until the end of the line. Then match a newline and optional following whitespace chars.
Then at the line where you want to get the last word, first match until the end of the line, then backtrack until the last space, and capture 1+ non whitespace chars in group insured_last_name
\bInsured Name:.*\r?\n\s*.* (?<insured_last_name>\S+)
In parts
\bInsured Name: Match literally
.*\r?\n\s* Match the rest of the line, a newline and 0+ whitespace chars
.* Match the rest of the line and match the last space
(?<insured_last_name>\S+) Match 1+ non whitespace chars in group insured_last_name
Regex demo
You can simply /\w+$/gm
Demo: https://regex101.com/r/McdMcq/4
Explanation:
\w: Look for alphanumeric letters
+: At least one
$: And then the end of the string
If there are multiple rows and potentially garbage data in between I would recommend you to remove the 2 newlines (\n\n) and then do a Positive Lookbehind looking for "Name". Demo: https://regex101.com/r/McdMcq/5
If you need to store the result in a capture group simply enclose \w+$ with parenthesis and group name (i.e (?<insured_last_name>\w+$)) on any of the two regexes.
You may need to define your data set a little more, but you can try
Insured Name:\n+.*(?<insured_last_name>\b.+)
Example
It starts at "Insured Name:", then any empty lines, then will read the following line until the final word boundary (excluding the EOL); anything after that is in your named group.

Match all instances of a certain character inside every word preceded by a certain word and not delimited by a space

Given a string such as below:
word.hi. bla. word.
I want to construct a regex which will match all "."s preceded by "word" and any other non space character
So, in the above example I would want the the first, second and last dots to be matched.
While matching the first and last dots would be easy with global flag (/(?:word.*)\K./gU), I'm not sure how to construct a regex that would also match the second dot.
Appreciate any pointers.
You might match word and then get all consecutive matches using the \G anchor excluding matching whitespace chars or a dot.
(?:\bword|\G(?!\A))[^.\s]*\K\.
In parts
(?: Non capture group
\bword Match word preceded by a word boundary
| Or
\G(?!\A) Assert the position at the end of the previous match, not at the start
) Close non capture group
[^.\s]* Match 0+ occurrences of any char except . or a whitespace char
\K Clear the match buffer (forget what is matched until now)
\. Match a dot
Regex demo

Regex - match any characters and allow any number of single spaces. Break match on a double space

I am looking to create a match for the following:
"Adam Lambert"
"Mr. Adam Lambert"
"adam#test.com"
But not match the following
"Adam Lambert"
"Adam Lambert "
Rules:
Any alphanumeric character should be matches
A single space at any point should be matched.
Any number of single spaces can be matches
double spaces are not matched
a single space at the end of a string is not matched
EDIT
I also need to match the following. Sorry I missed this.
name:((\w+(?:\S\w+)*|\s(?:\w+\S)*)\S)*
I need to match to:
name:
name:A
name:Adam Lambert
The above regex matches from "name:Ad..." but it will not match "name:A"
I would generalize a solution to matching a sequence of non-space characters followed by optional groups of non-space characters following a single space only, since your only hard criterion seems to be the number of spaces. For example:
^\S+(?: \S+)*$
^(?:\S+(?:\s\S+)*|\s(?:\S+\s)*)\S$
Meaning:
^ start of the line
(?: non-capturing group
\S+ one or more non-whitespace characters
(?:\s\S+)* zero or more groups of a single whitespace and one or more
non-whitespace characters
or (|)
^ start of the line
\s one whitespace character
(?:\S+\s)* zero or more groups of non-whitespace characters and one whitespace character
) end non-capturing group
Finally one non whitespace character \S and the end of the line: $.
In your third example the # won't be matched with \w but it will if you change it to \S (any non-whitespace character)
See it in action here: regexr.com/50lp2
edit: I can't type

Help with regular expression

In the following expression:
if (($$_ =~ /^.+:\s*\#\s*abcd\s+XYZ/)
Where is $$_ taken from?
The right side of the expression means to match one or more characters plus followed by colon, followed by zero or more spaces followed by # followed by one or more spaces folowed by 'abcd' followed by zero or more spaces followed by 'XYZ'?
You have the last "one or more" and "zero or more" reversed from what the regex actually does.
$$_ dereferences the scalar reference in $_.
Concerning 2., your explanation of the regex is not entirely correct.
/^.+:\s*#\s*abcd\s+XYZ/
means one or more characters (starting at the beginning of the string) followed by a colon, followed by zero or more whitespace characters, followed by one hash character, followed by zero or more whitespace characters, followed by 'abcd', followed by one or more whitespace characters, followed by 'XYZ'.
As for pt. 2:
Line beginning with (^) one or more characters (.+), colon (:), zero or more whitespace characters (\s*), a hash (\#), zero or more whitespace characters (\s*), the string "abcd" (abcd), one or more whitespace characters (\s+), then the string "XYZ" (XYZ).
(emphasis added on discrepancies.) Do note that there is no anchor on the end of line ($), thus this only concerns the beginning.
Have a look at this site
Here is the given explanation of your regex:
Token Meaning
^ Matches beginning of input. If the multiline flag is set to true,
also matches immediately after a line break character.
.+ Matches any single character except newline characters.
The + quantifier causes this item to be matched 1 or more times (greedy).
: :
\s* Matches a single white space character.
The * quantifier causes this item to be matched 0 or more times (greedy).
\# #
\s* Matches a single white space character.
The * quantifier causes this item to be matched 0 or more times (greedy).
abcd abcd
\s+ Matches a single white space character.
The + quantifier causes this item to be matched 1 or more times (greedy).
XYZ XYZ