Perl Regex with variables not matching the same text - regex

Could someone explain to me why the following prints "fail"? And what the workaround is?
my $test1 = "/k?user";
my $test2 = "/k?user";
if ($test1 =~ m/$test2/) {
print "match";
}
else {
print "fail";
}
If I change $test1 and $test1 to "/k?", the match works.
Clearly it has something to do with text following the ?. But, the variables I am trying to match have question marks in them, and I would rather not have to take everything apart, match the pieces, and then reconstruct everything.

? is a special character in a regex. Use quotemeta:
my $test1 = "/k?user";
my $test2 = quotemeta "/k?user";
if ($test1 =~ m/$test2/) {
print "match";
}
else {
print "fail";
}

To (only) match
/k?user
one needs to use the pattern
^/k\?user\z
because "?" doesn't match itself in a regex pattern. You need to escape it (use "\?") for it to match a "?", and escaping the special characters (such as "?") can be done using quotemeta.
my $str = '/k?user';
my $pat = quotemeta($str);
/^$pat\z/
quotemeta can also be accessed via \Q..\E in double-quoted string literals and regex pattern literals.
my $str = '/k?user';
/^\Q$str\E\z/
(The solution previously suggested by toolic would also match "!/k?userf".)

Related

Perl match regex variable \Q

I'm trying to match a regex in perl. The regex needs to be stored in a variable.
From this question I got \Q to match regex in a variable.
$regex = "\\$[0-9] (\\+|\\*) [0-9]";
$str = "$2 * 2";
if ($str =~ /\Q$regex/) { # regex is: \$[0-9] (\+|\*) [0-9]
print "Expression found :)\n";
} else {
print "Expression not found :(\n";
}
This matches fine in regexpal. It also works fine when I use the regex immediately without first putting it in $regex (i.e. without the \Q). What is the \Q doing to mess up my regex?
The \Q and \E pair can be used to escape all non-word characters within a double-quoted string context. For instance
perl -E 'say "abc[\Q[..]\E]def"'
output
abc[\[\.\.\]]def
I wonder why you think you need it, as it prevents all regex metacharacters from having their special effect. For instance \Q[0-9] will match exactly [0-9] instead of any single decimal digit
I would write your code like this. Note that I have changed double quotes to qr// when defining the pattern to create a compiled regex, and to single quotes when defining the target string to avoid Perl trying to interpolate built-in variable $2 into the string. You must always use strict and use warnings 'all' at the top of every Perl program you write
use strict;
use warnings 'all';
my $regex = qr/\$[0-9] [+*] [0-9]/;
my $str = '$2 * 2';
if ( $str =~ $regex ) {
print "Expression found :)\n";
}
else {
print "Expression not found :(\n";
}
output
Expression found :)

searching for parentheses in perl

Writing a program where I read in a list of words/symbols from one file and search for each one in another body of text.
So it's something like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /$find/){
push(#found, $_);
}
}
}
However, I run into trouble once parentheses show up. It gives me this error:
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE
I realize it's because Perl thinks the ( is part of the regex, but how do I deal with this and make the ( searchable?
You could use \Q and \E:
if ($_ =~ /\Q$find\E/){
Or just use index if you're just looking for a literal match:
if(index($_, $find) >= 0) {
In general backslash escapes characters inside regexes - i.e. /\(/ will match a literal (
in situations like this it's better to use the quote operator
if ( $_ =~ /\Q$find\E/ ) {
...
}
alternatively use quotemeta
You'll want to do /\Q$find\E/ instead of just /$find/ - the \Q tells the parser to stop considering metacharacters as part of the regex until it finds the \E.
I suspect you will find m/\Q$find\E/ useful - unless you want other Perl regex metacharacters to be interpreted as metacharacters.
\Q with \e will escape your special chars in the $find variable like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /\Q$find\e/){
push(#found, $_);
}
}
}

Escaping special characters in Perl regex

I'm trying to match a regular expression in Perl. My code looks like the following:
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/$pattern/) {
print "Match found!"
}
The problem arises in that brackets indicate a character class (or so I read) when Perl tries to match the regex, and the match ends up failing. I know that I can escape the brackets with \[ or \], but that would require another block of code to go through the string and search for the brackets. Is there a way to have the brackets automatically ignored without escaping them individually?
Quick note: I can't just add the backslash, as this is just an example. In my real code, $source and $pattern are both coming from outside the Perl code (either URIEncoded or from a file).
\Q will disable metacharacters until \E is found or the end of the pattern.
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/\Q$pattern/) {
print "Match found!"
}
http://www.anaesthetist.com/mnm/perl/Findex.htm
Use quotemeta():
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = quotemeta("Hello_[version]");
if ($source =~ m/$pattern/) {
print "Match found!"
}
You are using the Wrong Tool for the job.
You do not have a pattern! There are NO regex
characters in $pattern!
You have a literal string.
index() is for working with literal strings...
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ( index($source, $pattern) != -1 ) {
print "Match found!";
}
You can escape set of special characters in an expression by using the following command.
expression1 = 'text with special characters like $ % ( )';
expression1 =~s/[\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/"\\$&"/eg ;
#This will escape all the special characters
print "expression1'; # text with special characters like \$ \% \( \)

Using variables that contain special characters in Perl regexes

I'm trying to search an array for lines that contain $inbucket[0]. Some of my $inbucket[0] values include special characters. This script does exactly what I want it to, until I hit a special character.
I want the query to be case insensitive, match any part of the string $var, and process the special characters literally, as if they weren't special. Any ideas?
Thanks!
sub loopthru() {
warn "Loopthru begun on $inbucket[0]\n";
foreach $c (#chat) {
$var = $c->msg;
$lookfor2 = $inbucket[0];
if ( $var =~ /$lookfor2/i ) {
($to,$from) = split('-',$var);
$from =~ s/\.$//;
print MYFILE "$to\t$from\n";
&fillbucket($to);
&fillbucket($from);
}
}
}
You can use quotemeta, which returns the value of its argument with all non-"word" characters backslashed.
$lookfor2 = quotemeta $inbucket[0];
Or you can use the \Q escape, which is discussed in perlre. In short, it will quote (disable) pattern metacharacters until \E is encountered.
if ( $var =~ /\Q$lookfor2/i ) {
I think you are looking for
$var =~ /\Q$lookfor2/i
perl faq

How do I handle special characters in a Perl regex?

I'm using a Perl program to extract text from a file. I have an array of strings which I use as delimiters for the text, e.g:
$pat = $arr[1] . '(.*?)' . $arr[2];
if ( $src =~ /$pat/ ) {
print $1;
}
However, two of the strings in the array are $450 and (Buy now). The problem with these is that the symbols in the strings represent end-of-string and capture group in Perl regular expressions, so the text doesn't parse as I intend.
Is there a way around this?
Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.
quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:
$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }
or
$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]"; # \E not necessary at the end
if($src=~$pat) { print $1 }
or just
if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }
Note that this isn't limited to interpolated variables; literal characters are affected too:
perl -wle'print "\Q.+?"'
\.\+\?
though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.
Use quotemeta:
$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat)
print $1;