Matching regexp with boundary values in Perl - regex

I am trying to match below mentioned regexp with \b and \W. It doesn't match with \b but matches with \W
my $response = "ABC-12-1-1::HELLO=TX,PROVFEADDR=\"\",ValueFORM=NAME-CITY-STREET-PRT,";
print "\n\n\n$response\n\n\n";
if ( $response =~ /PROVFEADDR=\b/ ) ##### matching with //PROVFEADDR=\W/
{
print "matched\n";
} else {
print "not matched\n";
}
Any clues
As per the user comments I am editing post a little.
I understood now why it is matching with \W. Below is the problem why i started using \b
PROVFEADDR is a variable to match. In this particular case I have to match PROVFEADR=. Earlier we were using \W+ instead of \b. With \W+, problem is when we have to match at the end of string. \W+ expects atleast one \W which is not there if it is at the last of the string. So I replaced with \b which worked in the above mentioned scenario. Any suggestion which can handle both cases?

The reason \b does not match is that it needs a word and a non-word character next to it, and you have two non-word characters.
In your comments, you have mentioned that you are looking for a replacement for \W that also matches end of line, in which case a negative lookahead assertion can be used:
if($response =~ /PROVFEADDR=(?!\w)/)
It asserts that the next character is not an alphanumeric character. Which will also match end of line (the empty string).

In $response, the character after PROVFEADDR= is the double quote, not a word, so it matches \W(non-word).
It doesn't match \b because it's not a word boundary. Compare it with:
if($response =~ /PROVFEADDR\b=/)
Here, between R and = is a word boundary.

Related

Perl Regexp::Common package not matching certain real numbers when used with word boundary

The following code below print "34" instead of the expected ".34"
use strict;
use warnings;
use Regexp::Common;
my $regex = qr/\b($RE{num}{real})\s*/;
my $str = "This is .34 meters of cable";
if ($str =~ /$regex/) {
print $1;
}
Do I need to fix my regex? (The word boundary is need as not including it will cause it match something string like xx34 which I don't want to)
Or is it is a bug in Regexp::Common? I always thought that a longest match should win.
The word boundary is a context-dependent regex construct. When it is followed with a word char (letter, digit or _) this location should be preceded either with the start of a string or a non-word char. In this concrete case, the word boundary is followed with a non-word char and thus requires a word char to appear right before this character.
You may use a non-ambiguous word boundary expressed with a negative lookbehind:
my $regex = qr/(?<!\w)($RE{num}{real})/;
^^^^^^^
The (?<!\w) negative lookbehind always denotes one thing: fail the match if there
is no word character immediately to the left of the current location.
Or, use a whitespace boundary if you want your matches to only occur after whitespace or start of string:
my $regex = qr/(?<!\S)($RE{num}{real})/;
^^^^^^^
Try this patern: (?:^| )(\d*\.?\d+)
Explanation:
(?:...) - non-capturing group
^| - match either ^ - beginning oof a string or - space
\d* - match zero or more digits
\.? - match dot literally - zero or one
\d+ - match one or more digits
Matched number will be stored in first capturing group.
Demo

how to match non-word+word boundary in javascript regex

how to match non-word+word boundary in javascript regex.
"This is, a beautiful island".match(/\bis,\b/)
In the above case why does not the regex engine match till is, and assume the space to be a word boundary without moving further.
\b asserts a position where a word character \w meets a non-word character \W or vice versa. Comma is a non-word character and space is as well. So \b never matches a position between a comma and a space.
Also you forgot to put ending delimiter in your regex.
You can use \B after comma that matches where \b doesn't since comma is not considered a word character.
console.log( "This is, a beautiful island".match(/\bis,\B/) )
//=> ["is,"]

Regex for spoof

I would like to ask for help regarding my problem when it comes to spoofing let say usernames and I want to catch them using regex.
for example the correct username is :
rolf
and here are the spoofed versions that I could think of:
roooolf
r123olf
123rolf123
rolf5623
123rolf
rollllf
rrrrrrolf
rolffff
So basically I have this regex expression ( that I know is not sufficient because I've tried it on regex101 website )
.+(?![rolf]).+
I'm using this as a baseline because it doesnt catch the correct username which is :
rolf
but it doesn't catch all the other "spoofed" versions of the username.
Any Ideas how can I make my regex more efficient?
Thanks in advance!
You may try this too
(?m)^(?![^\n]*?rolf[^\n]*$).*$
Demo
To match not exactly rolf You can use a negative lookahead (?! to assert that what follows from the beginning of the string is not 'rolf' until the end of the string.
^(?!rolf$).+$
That would match
^ Assert position at the begin of the string
(?! Negative lookahead that asserts that what follows is not
rolf Match literally
) Close negative lookahead
.+ Match any character one or more times
$Assert position at the end of the string
From your example regex you match .+ where #Ωmega has a fair point, matches spaces.
Instead of .+ you could specify what characters you might accept like \w+ for example to match one or more word characters or specify more using a character class.
You can use a regex pattern
\b(?!rolf\b)\S+\b
\b Word boundary - Matches a word boundary position between a
word character and non-word character or position (start / end of
string).
(?! Negative lookahead - Specifies a group that can not match
after the main expression (if it matches, the result is discarded).
\S Not whitespace - Matches any character that is not a
whitespace character (spaces, tabs, line breaks).
+ Quantifier - Match 1 or more of the preceding token.
Test your inputs with this pattern here.

regex : how to match word ending with parentheses ")"

I want to match string ending with ')' .
I use pattern :
"[)]\b" or ".*[)]\b"
It should match the string :
x=main2.addMenu('Edit')
But it doesn't work. What is wrong ?
The \b only matches a position at a word boundary. Think of it as a (^\w|\w$|\W\w|\w\W) where \w is any alphanumeric character and \W is any non-alphanumeric character. The parenthesis is non-alphanumeric so won't be matched by \b.
Just match a parethesis, followed by the end of the string by using \)$
If you want to capture a string ending in ) (and not just find a trailing )), then you can use this in JS:
(.*?\)$)
(....) - captures the defined content;
.*? - matches anything up to the next element;
\)$ - a ) at the end of the string (needs to be escaped);
Regex101
The \b word boundary is ambiguous: after a word character, it requires that the next character must a non-word one or the end of string. When it stands after a non-word char (like )) it requires a word character (letter/digit/underscore) to appear right after it (not the end of the string here!).
So, there are three solutions:
Use \B (a non-word boundary): .*[)]\B (see demo) that will not allow matching if the ) is followed with a word character
Use .*[)]$ with MULTILINE mode (add (?m) at the start of the pattern or add the /m modifier, see demo)
Emulate the multiline mode with an alternation group: .*[)](\r?\n|$) (see demo)

A regex for the last use of a word in a string

I'm trying to figure out how to grab the tail end of a string using a word as a delimiter, but that word can be used anywhere in the string. So only the last use would start the grab.
example: Go by the office and pickup milk by the safeway BY tomorrow
I want to grab the by tomorrow and not the other bys
This is the regex I'm trying to make robust:
$pattern = '/^(.*?)(#.*?)?(\sBY\s.*?)?(#.*)?$/i';
I think a negative lookahead would do it, but I've never used one before
Thanks!
I'm not sure what are the other things you have in the regex for, but here's the one I would use:
$pattern = '/\bby\s(?!.*\bby\b).*?$/i';
regex101 demo
\b is word boundary and will match only between a \w and a \W character or at string beginning/end.
by matches by literally.
\s matches a space (also matches newlines, tabs, form feeds, carriage returns)
(?!.*\bby\b) is the negative lookahead and will prevent a match if there is another word by ahead.
.*?$ is to get the remaining part of the string till the end of the string.
To match the last BY (uppercase and lowercase letters) try this regex:
\b[bB][yY]\b(?!.*\b[bB][yY]\b.*)
see demo here http://regex101.com/r/uA2rL0
This uses the \b word boundary to avoid matching things like nearby and as you said a negative lookahead.