Help with regular expression - regex

In the following expression:
if (($$_ =~ /^.+:\s*\#\s*abcd\s+XYZ/)
Where is $$_ taken from?
The right side of the expression means to match one or more characters plus followed by colon, followed by zero or more spaces followed by # followed by one or more spaces folowed by 'abcd' followed by zero or more spaces followed by 'XYZ'?

You have the last "one or more" and "zero or more" reversed from what the regex actually does.
$$_ dereferences the scalar reference in $_.

Concerning 2., your explanation of the regex is not entirely correct.
/^.+:\s*#\s*abcd\s+XYZ/
means one or more characters (starting at the beginning of the string) followed by a colon, followed by zero or more whitespace characters, followed by one hash character, followed by zero or more whitespace characters, followed by 'abcd', followed by one or more whitespace characters, followed by 'XYZ'.

As for pt. 2:
Line beginning with (^) one or more characters (.+), colon (:), zero or more whitespace characters (\s*), a hash (\#), zero or more whitespace characters (\s*), the string "abcd" (abcd), one or more whitespace characters (\s+), then the string "XYZ" (XYZ).
(emphasis added on discrepancies.) Do note that there is no anchor on the end of line ($), thus this only concerns the beginning.

Have a look at this site
Here is the given explanation of your regex:
Token Meaning
^ Matches beginning of input. If the multiline flag is set to true,
also matches immediately after a line break character.
.+ Matches any single character except newline characters.
The + quantifier causes this item to be matched 1 or more times (greedy).
: :
\s* Matches a single white space character.
The * quantifier causes this item to be matched 0 or more times (greedy).
\# #
\s* Matches a single white space character.
The * quantifier causes this item to be matched 0 or more times (greedy).
abcd abcd
\s+ Matches a single white space character.
The + quantifier causes this item to be matched 1 or more times (greedy).
XYZ XYZ

Related

Regex - match any characters and allow any number of single spaces. Break match on a double space

I am looking to create a match for the following:
"Adam Lambert"
"Mr. Adam Lambert"
"adam#test.com"
But not match the following
"Adam Lambert"
"Adam Lambert "
Rules:
Any alphanumeric character should be matches
A single space at any point should be matched.
Any number of single spaces can be matches
double spaces are not matched
a single space at the end of a string is not matched
EDIT
I also need to match the following. Sorry I missed this.
name:((\w+(?:\S\w+)*|\s(?:\w+\S)*)\S)*
I need to match to:
name:
name:A
name:Adam Lambert
The above regex matches from "name:Ad..." but it will not match "name:A"
I would generalize a solution to matching a sequence of non-space characters followed by optional groups of non-space characters following a single space only, since your only hard criterion seems to be the number of spaces. For example:
^\S+(?: \S+)*$
^(?:\S+(?:\s\S+)*|\s(?:\S+\s)*)\S$
Meaning:
^ start of the line
(?: non-capturing group
\S+ one or more non-whitespace characters
(?:\s\S+)* zero or more groups of a single whitespace and one or more
non-whitespace characters
or (|)
^ start of the line
\s one whitespace character
(?:\S+\s)* zero or more groups of non-whitespace characters and one whitespace character
) end non-capturing group
Finally one non whitespace character \S and the end of the line: $.
In your third example the # won't be matched with \w but it will if you change it to \S (any non-whitespace character)
See it in action here: regexr.com/50lp2
edit: I can't type

regexp print line by line and remove last word

I am trying to remove last word from each line if line contains more than one word.
If line has only one word then print it as it, no need to delete it.
say below are the lines
address 34 address
value 1 value
valuedescription
size 4 size
from above lines I want to remove all last words from each line except from 3rd line as it has only one word using regexp ..
I tried below regexp and it is removing single word lines also
$_ =~ s/\s*\S+\s*+$//;
Need your help for the same.
You can use:
$_ =~ s/(?<=\w)\h+\w+$//m;
RegEx Demo
Explanation:
(?<=\w): Lookbehind to assert that we have at least one word char before last word
\h+: Match 1+ horizontal whitespaces
\w+: match a word with 1+ word characters
$: End of line
Try this regex:
^(?=(?:\w+ \w+)).*\K\b\w+
Replace each match with a blank string
Click for Demo
OR
^((?=(?:\w+ \w+)).*\b)\w+
and replace each match with \1
Click for Demo
Explanation(1st Regex):
^ - asserts the start of the line
(?=(?:\w+ \w+)) - positive lookahead to check if the string has 2 words present in it
.* - If the above condition satisfies, then match 0+ occurrences of any character(except newline) until the end of the line
\K - forget everything matched so far
\b - backtrack to find the last word boundary
\w+ - matches the last word
a single word with no whitespace matches your regex since you've used \s* both before and after the \S+, and \s* matches an empty string.
You could use $_ =~ s/^(.*\S)\s+(\S+)$/$1/;
[Explanation: Match the RegEx if the line contains some number of characters ending with a non-whitespace (stored in $1), followed by 1 or more white-space characters, followed by 1 or more non-white-space characters. If there is a match, replace it all with the first part ($1).]
Though you might want to trim leading/trailing whitespace if you think it might contain any - depends on what you want to happen in those cases.

regex last character of a WORD

I'm attempting to match the last character in a WORD.
A WORD is a sequence of non-whitespace characters
'[^\n\r\t\f ]', or an empty line matching ^$.
The expression I made to do this is:
"[^ \n\t\r\f]\(?:[ \$\n\t\r\f]\)"
The regex matches a non-whitespace character that follows a whitespace character or the end of the line.
But I don't know how to stop it from excluding the following whitespace character from the result and why it doesn't seem to capture a character preceding the end of the line.
Using the string "Hi World!", I would expect: the "i" and "!" to be captured.
Instead I get: "i ".
What steps can I take to solve this problem?
"Word" that is a sequence of non-whitespace characters scenario
Note that a non-capturing group (?:...) in [^ \n\t\r\f](?:[ \$\n\t\r\f]) still matches (consumes) the whitespace char (thus, it becomes a part of the match) and it does not match at the end of the string as the $ symbol is not a string end anchor inside a character class, it is parsed as a literal $ symbol.
You may use
\S(?!\S)
See the regex demo
The \S matches a non-whitespace char that is not followed with a non-whitespace char (due to the (?!\S) negative lookahead).
General "word" case
If a word consists of just letters, digits and underscores, that is, if it is matched with \w+, you may simply use
\w\b
Here, \w matches a "word" char, and the word boundary asserts there is no word char right after.
See another regex demo.
In Word text, if I want to highlight the last a in para. I search for all the words that have [space][para][space] to make sure I only have the word I want, then when it is found it should be highlighted.
Next, I search for the last [a ] space added, in the selection and I will get only the last [a] and I will highlight it or color it differently.

regex nonconsecutive match

I'm trying to match a word that has 2 vowels in it (doesn't have to be consecutively) but the regex I've come up either matches nothing or not enough. This is the last iteration (dart).
final vowelRegex = new RegExp(r'[aeiouy]{2}');
Here's an example sentence being parsed and it should match, one, shoulder, their, and over. It's only matching shoulder and their. I understand why, because that's the expression I defined. How can the expression be defined to match on 2 vowels, regardless of position in the word?
one shoulder their the which over
The expression only needs to be tested on one word at a time so hopefully this simplifies things.
You can use :
new RegExp(r'(\w*[aeiouy]\w*){2}');
Both of the previous two answers are incorrect.
(\S*[aeiouy]\S*){2} can match substrings of non-whitespace characters even if they contain non-word characters (proof).
\S*[aeiouy]\S*[aeiouy]\S* has the same problem (proof).
Correct solution:
\b([^\Waeiou]*[aeiou]){2}\w*\b
And if you want only whitespace to count as the word boundary (rather than any non-word character), then use the following regex where the target word is in capture group \2.
(\s|^)(([^\Waeiou]*[aeiou]){2}\w*)(\s|$)
You can try this:
\S*[aeiouy]\S*[aeiouy]\S*
Explanation
\S* matches any non-whitespace character (equal to [^\r\n\t\f ])
* Quantifier — Matches between zero and unlimited times
[aeiou] Match a single character present in the list below [aeiou]
For input string : one shoulder their the which over
it will match four word: one shoulder their over
I'd do:
\b(?:\w*[aeiouy]+\w*){2,}\b
Explanation:
\b : word boundary
(?: : start non-capture group
\w* : 0 or more word characters
[aeiouy]+ : 1 or more vowels
\w* : 0 or more word characters
){2,} : end group repeated at least twice
\b : word boundary

Regular expression not worknig

I am trying to create a regular expression in javascript with the following rules:
At least 2 characters.
Should have at least 1 letter as a prefix and end with a . or have or - and then have more letters.
The following strings should be legal - aa, aaaaa, a., a-a, a a.
These should not be legal - a (too short), aa.aa. (two dots), aa- (after - should be another letter).
I don't know what I'm doing wrong here but my regex doesn't seem to work, as it is legal yet no word matches it:
(?=^.{2,}$)^(([a-z][A-Z])+([.]|[ -][a-zA-Z]+){0,1}$)
Had to re-write it completely to cover op's comment. The new regex would be:
^[a-zA-Z][a-zA-Z]*[ -][a-zA-Z]*[a-zA-Z]$|^[a-zA-Z][a-zA-Z]*([a-zA-Z]|\.)$
Explanation
1st Alternative ^[a-zA-Z][a-zA-Z]*[ -][a-zA-Z]*[a-zA-Z]$
^ asserts position at start of a line
[a-zA-Z] Match a single character present in [a-zA-Z]
[a-zA-Z]* * Quantifier — Matches between zero and unlimited
times(greedy)
[ -] Match a single character - or a space
$ asserts position at the end of a line
2nd Alternative
^[a-zA-Z][a-zA-Z]*([a-zA-Z]|\.)$
^ asserts position at start of a line
[a-zA-Z] Match a single character present in [a-zA-Z]
[a-zA-Z]* * Quantifier — Matches between zero and unlimited
times(greedy)
([a-zA-Z]|.) Match a single character present in the list below
[a-zA-Z] or dot
$ asserts position at the end of a line