Escaping special characters in Perl regex - regex

I'm trying to match a regular expression in Perl. My code looks like the following:
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/$pattern/) {
print "Match found!"
}
The problem arises in that brackets indicate a character class (or so I read) when Perl tries to match the regex, and the match ends up failing. I know that I can escape the brackets with \[ or \], but that would require another block of code to go through the string and search for the brackets. Is there a way to have the brackets automatically ignored without escaping them individually?
Quick note: I can't just add the backslash, as this is just an example. In my real code, $source and $pattern are both coming from outside the Perl code (either URIEncoded or from a file).

\Q will disable metacharacters until \E is found or the end of the pattern.
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/\Q$pattern/) {
print "Match found!"
}
http://www.anaesthetist.com/mnm/perl/Findex.htm

Use quotemeta():
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = quotemeta("Hello_[version]");
if ($source =~ m/$pattern/) {
print "Match found!"
}

You are using the Wrong Tool for the job.
You do not have a pattern! There are NO regex
characters in $pattern!
You have a literal string.
index() is for working with literal strings...
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ( index($source, $pattern) != -1 ) {
print "Match found!";
}

You can escape set of special characters in an expression by using the following command.
expression1 = 'text with special characters like $ % ( )';
expression1 =~s/[\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/"\\$&"/eg ;
#This will escape all the special characters
print "expression1'; # text with special characters like \$ \% \( \)

Related

Perl Regex OR condition

I have a variable which is like $var = 1.2.3 or $var = Variable/1.2.3. I am trying to write a regex that matches and stores in $1. The code goes as follows:
if ($var =~ m/[\w+\/\d+.\d+.\d+])/){
$a = $1;
}
I want to match in the above condition if any of the $var prevails. Please suggest me. Thank you.
Parentheses capture. It is error-prone to rely on $1, simply use the return value from the match operator.
for my $var ('1.2.3', 'Variable/1.2.3') {
if (my ($version) = $var =~ m{
(?:\A | /) # beginning of string or a slash
(\d+ [.] \d+ [.] \d+) # capture version number triple
\z # end of string
}msx) {
print ">>> $version <<<\n";
}
}
__END__
>>> 1.2.3 <<<
>>> 1.2.3 <<<
You should remove that character class. A character classes matches just a single character. It doesn't represent any sequence. Also, . is a meta-character in regex. To match it literally, you need to escape it.
And then, you need to make the part before / as optional, using ? quantifier, as it is not necessarily present in string:
if ($var =~ m/((?:\w+\/)?\d+\.\d+\.\d+)/){
$a = $1;
}
Just FYI, you can use any delimiter for match operator, so as to avoid escaping /:
m!((?:\w+/)?\d+\.\d+\.\d+)!
Try this:
if ($var =~ m/(?>[a-z_][a-z0-9_]*+\/)?+[0-9]++\.[0-9]++\.[0-9]++/i) {
$a = $&;
}

Perl Regex with variables not matching the same text

Could someone explain to me why the following prints "fail"? And what the workaround is?
my $test1 = "/k?user";
my $test2 = "/k?user";
if ($test1 =~ m/$test2/) {
print "match";
}
else {
print "fail";
}
If I change $test1 and $test1 to "/k?", the match works.
Clearly it has something to do with text following the ?. But, the variables I am trying to match have question marks in them, and I would rather not have to take everything apart, match the pieces, and then reconstruct everything.
? is a special character in a regex. Use quotemeta:
my $test1 = "/k?user";
my $test2 = quotemeta "/k?user";
if ($test1 =~ m/$test2/) {
print "match";
}
else {
print "fail";
}
To (only) match
/k?user
one needs to use the pattern
^/k\?user\z
because "?" doesn't match itself in a regex pattern. You need to escape it (use "\?") for it to match a "?", and escaping the special characters (such as "?") can be done using quotemeta.
my $str = '/k?user';
my $pat = quotemeta($str);
/^$pat\z/
quotemeta can also be accessed via \Q..\E in double-quoted string literals and regex pattern literals.
my $str = '/k?user';
/^\Q$str\E\z/
(The solution previously suggested by toolic would also match "!/k?userf".)

searching for parentheses in perl

Writing a program where I read in a list of words/symbols from one file and search for each one in another body of text.
So it's something like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /$find/){
push(#found, $_);
}
}
}
However, I run into trouble once parentheses show up. It gives me this error:
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE
I realize it's because Perl thinks the ( is part of the regex, but how do I deal with this and make the ( searchable?
You could use \Q and \E:
if ($_ =~ /\Q$find\E/){
Or just use index if you're just looking for a literal match:
if(index($_, $find) >= 0) {
In general backslash escapes characters inside regexes - i.e. /\(/ will match a literal (
in situations like this it's better to use the quote operator
if ( $_ =~ /\Q$find\E/ ) {
...
}
alternatively use quotemeta
You'll want to do /\Q$find\E/ instead of just /$find/ - the \Q tells the parser to stop considering metacharacters as part of the regex until it finds the \E.
I suspect you will find m/\Q$find\E/ useful - unless you want other Perl regex metacharacters to be interpreted as metacharacters.
\Q with \e will escape your special chars in the $find variable like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /\Q$find\e/){
push(#found, $_);
}
}
}

Using variables that contain special characters in Perl regexes

I'm trying to search an array for lines that contain $inbucket[0]. Some of my $inbucket[0] values include special characters. This script does exactly what I want it to, until I hit a special character.
I want the query to be case insensitive, match any part of the string $var, and process the special characters literally, as if they weren't special. Any ideas?
Thanks!
sub loopthru() {
warn "Loopthru begun on $inbucket[0]\n";
foreach $c (#chat) {
$var = $c->msg;
$lookfor2 = $inbucket[0];
if ( $var =~ /$lookfor2/i ) {
($to,$from) = split('-',$var);
$from =~ s/\.$//;
print MYFILE "$to\t$from\n";
&fillbucket($to);
&fillbucket($from);
}
}
}
You can use quotemeta, which returns the value of its argument with all non-"word" characters backslashed.
$lookfor2 = quotemeta $inbucket[0];
Or you can use the \Q escape, which is discussed in perlre. In short, it will quote (disable) pattern metacharacters until \E is encountered.
if ( $var =~ /\Q$lookfor2/i ) {
I think you are looking for
$var =~ /\Q$lookfor2/i
perl faq

How do I handle special characters in a Perl regex?

I'm using a Perl program to extract text from a file. I have an array of strings which I use as delimiters for the text, e.g:
$pat = $arr[1] . '(.*?)' . $arr[2];
if ( $src =~ /$pat/ ) {
print $1;
}
However, two of the strings in the array are $450 and (Buy now). The problem with these is that the symbols in the strings represent end-of-string and capture group in Perl regular expressions, so the text doesn't parse as I intend.
Is there a way around this?
Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.
quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:
$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }
or
$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]"; # \E not necessary at the end
if($src=~$pat) { print $1 }
or just
if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }
Note that this isn't limited to interpolated variables; literal characters are affected too:
perl -wle'print "\Q.+?"'
\.\+\?
though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.
Use quotemeta:
$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat)
print $1;