Using variables that contain special characters in Perl regexes - regex

I'm trying to search an array for lines that contain $inbucket[0]. Some of my $inbucket[0] values include special characters. This script does exactly what I want it to, until I hit a special character.
I want the query to be case insensitive, match any part of the string $var, and process the special characters literally, as if they weren't special. Any ideas?
Thanks!
sub loopthru() {
warn "Loopthru begun on $inbucket[0]\n";
foreach $c (#chat) {
$var = $c->msg;
$lookfor2 = $inbucket[0];
if ( $var =~ /$lookfor2/i ) {
($to,$from) = split('-',$var);
$from =~ s/\.$//;
print MYFILE "$to\t$from\n";
&fillbucket($to);
&fillbucket($from);
}
}
}

You can use quotemeta, which returns the value of its argument with all non-"word" characters backslashed.
$lookfor2 = quotemeta $inbucket[0];
Or you can use the \Q escape, which is discussed in perlre. In short, it will quote (disable) pattern metacharacters until \E is encountered.
if ( $var =~ /\Q$lookfor2/i ) {

I think you are looking for
$var =~ /\Q$lookfor2/i
perl faq

Related

extract string between two dots

I have a string of the following format:
word1.word2.word3
What are the ways to extract word2 from that string in perl?
I tried the following expression but it assigns 1 to sub:
#perleval $vars{sub} = $vars{string} =~ /.(.*)./; 0#
EDIT:
I have tried several suggestions, but still get the value of 1. I suspect that the entire expression above has a problem in addition to parsing. However, when I do simple assignment, I get the correct result:
#perleval $vars{sub} = $vars{string} ; 0#
assigns word1.word2.word3 to variable sub
. has a special meaning in regular expressions, so it needs to be escaped.
.* could match more than intended. [^.]* is safer.
The match operator (//) simply returns true/false in scalar context.
You can use any of the following:
$vars{sub} = $vars{string} =~ /\.([^.]*)\./ ? $1 : undef;
$vars{sub} = ( $vars{string} =~ /\.([^.]*)\./ )[0];
( $vars{sub} ) = $vars{string} =~ /\.([^.]*)\./;
The first one allows you to provide a default if there's no match.
Try:
/\.([^\.]+)\./
. has a special meaning and would need to be escaped. Then you would want to capture the values between the dots, so use a negative character class like ([^\.]+) meaning at least one non-dot. if you use (.*) you will get:
word1.stuff1.stuff2.stuff3.word2 to result in:
stuff1.stuff2.stuff3
But maybe you want that?
Here is my little example, I do find the perl one liners a little harder to read at times so I break it out:
use strict;
use warnings;
if ("stuff1.stuff2.stuff3" =~ m/\.([^.]+)\./) {
my $value = $1;
print $value;
}
else {
print "no match";
}
result
stuff2
. has a special meaning: any character (see the expression between your parentheses)
Therefore you have to escape it (\.) if you search a literal dot:
/\.(.*)\./
You've got to make sure you're asking for a list when you do the search.
my $x= $string =~ /look for (pattern)/ ;
sets $x to 1
my ($x)= $string =~ /look for (pattern)/ ;
sets $x to pattern.

Perl Regex with variables not matching the same text

Could someone explain to me why the following prints "fail"? And what the workaround is?
my $test1 = "/k?user";
my $test2 = "/k?user";
if ($test1 =~ m/$test2/) {
print "match";
}
else {
print "fail";
}
If I change $test1 and $test1 to "/k?", the match works.
Clearly it has something to do with text following the ?. But, the variables I am trying to match have question marks in them, and I would rather not have to take everything apart, match the pieces, and then reconstruct everything.
? is a special character in a regex. Use quotemeta:
my $test1 = "/k?user";
my $test2 = quotemeta "/k?user";
if ($test1 =~ m/$test2/) {
print "match";
}
else {
print "fail";
}
To (only) match
/k?user
one needs to use the pattern
^/k\?user\z
because "?" doesn't match itself in a regex pattern. You need to escape it (use "\?") for it to match a "?", and escaping the special characters (such as "?") can be done using quotemeta.
my $str = '/k?user';
my $pat = quotemeta($str);
/^$pat\z/
quotemeta can also be accessed via \Q..\E in double-quoted string literals and regex pattern literals.
my $str = '/k?user';
/^\Q$str\E\z/
(The solution previously suggested by toolic would also match "!/k?userf".)

searching for parentheses in perl

Writing a program where I read in a list of words/symbols from one file and search for each one in another body of text.
So it's something like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /$find/){
push(#found, $_);
}
}
}
However, I run into trouble once parentheses show up. It gives me this error:
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE
I realize it's because Perl thinks the ( is part of the regex, but how do I deal with this and make the ( searchable?
You could use \Q and \E:
if ($_ =~ /\Q$find\E/){
Or just use index if you're just looking for a literal match:
if(index($_, $find) >= 0) {
In general backslash escapes characters inside regexes - i.e. /\(/ will match a literal (
in situations like this it's better to use the quote operator
if ( $_ =~ /\Q$find\E/ ) {
...
}
alternatively use quotemeta
You'll want to do /\Q$find\E/ instead of just /$find/ - the \Q tells the parser to stop considering metacharacters as part of the regex until it finds the \E.
I suspect you will find m/\Q$find\E/ useful - unless you want other Perl regex metacharacters to be interpreted as metacharacters.
\Q with \e will escape your special chars in the $find variable like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /\Q$find\e/){
push(#found, $_);
}
}
}

Escaping special characters in Perl regex

I'm trying to match a regular expression in Perl. My code looks like the following:
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/$pattern/) {
print "Match found!"
}
The problem arises in that brackets indicate a character class (or so I read) when Perl tries to match the regex, and the match ends up failing. I know that I can escape the brackets with \[ or \], but that would require another block of code to go through the string and search for the brackets. Is there a way to have the brackets automatically ignored without escaping them individually?
Quick note: I can't just add the backslash, as this is just an example. In my real code, $source and $pattern are both coming from outside the Perl code (either URIEncoded or from a file).
\Q will disable metacharacters until \E is found or the end of the pattern.
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/\Q$pattern/) {
print "Match found!"
}
http://www.anaesthetist.com/mnm/perl/Findex.htm
Use quotemeta():
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = quotemeta("Hello_[version]");
if ($source =~ m/$pattern/) {
print "Match found!"
}
You are using the Wrong Tool for the job.
You do not have a pattern! There are NO regex
characters in $pattern!
You have a literal string.
index() is for working with literal strings...
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ( index($source, $pattern) != -1 ) {
print "Match found!";
}
You can escape set of special characters in an expression by using the following command.
expression1 = 'text with special characters like $ % ( )';
expression1 =~s/[\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/"\\$&"/eg ;
#This will escape all the special characters
print "expression1'; # text with special characters like \$ \% \( \)

How do I handle special characters in a Perl regex?

I'm using a Perl program to extract text from a file. I have an array of strings which I use as delimiters for the text, e.g:
$pat = $arr[1] . '(.*?)' . $arr[2];
if ( $src =~ /$pat/ ) {
print $1;
}
However, two of the strings in the array are $450 and (Buy now). The problem with these is that the symbols in the strings represent end-of-string and capture group in Perl regular expressions, so the text doesn't parse as I intend.
Is there a way around this?
Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.
quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:
$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }
or
$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]"; # \E not necessary at the end
if($src=~$pat) { print $1 }
or just
if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }
Note that this isn't limited to interpolated variables; literal characters are affected too:
perl -wle'print "\Q.+?"'
\.\+\?
though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.
Use quotemeta:
$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat)
print $1;