How do I handle special characters in a Perl regex? - regex

I'm using a Perl program to extract text from a file. I have an array of strings which I use as delimiters for the text, e.g:
$pat = $arr[1] . '(.*?)' . $arr[2];
if ( $src =~ /$pat/ ) {
print $1;
}
However, two of the strings in the array are $450 and (Buy now). The problem with these is that the symbols in the strings represent end-of-string and capture group in Perl regular expressions, so the text doesn't parse as I intend.
Is there a way around this?

Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.

quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:
$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }
or
$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]"; # \E not necessary at the end
if($src=~$pat) { print $1 }
or just
if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }
Note that this isn't limited to interpolated variables; literal characters are affected too:
perl -wle'print "\Q.+?"'
\.\+\?
though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.

Use quotemeta:
$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat)
print $1;

Related

I would like to use regex to insert specific characters in a regex expression?

I'd like to be able to use regex in Perl to insert characters into words.
So that the word "TABLE" would become "T%A%B%L%E%"
Can I ask for the syntax for such a feat?
Many thanks
Break the string into characters then join them with what you want in between; also append that
my $res = ( join '%', split //, $string ) . '%';
A simple-minded way with regex
$string =~ s/(.)/$1%/g;
where with /r modifier you can preserve $string and return the changed string instead
my $res = $string =~ s/(.)/$1%/gr;
You can use this command,
echo TABLE|perl -pe 's/\w/$&%/g'
This outputs T%A%B%L%E%
OR (in case your data is contained in a file)
perl -pe 's/\w/$&%/g' test.pl
You may replace \w with [a-zA-Z] if you just want to replace with alphabets as \w matchs alphabets numbers and underscore.
You can use look-behind also
my $s = "table";
$s=~s/(?<=.)/%/g;
print $s;
If your version >5.14 you can use \K
$s=~s/.\K/%/g;

How to match '(' using a regex?

When I do this
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $s = 'dfgdfg5 )';
my $a = '5 )';
my $b = '567';
$s =~ s/$a/$b/g;
print Dumper $s;
I get
Unmatched ) in regex; marked by <-- HERE in m/5 ) <-- HERE / at ./test.pl line 11.
The problem is that $a have a (.
How do I prevent the regex from failing?
Update
The string in $a do I get from a database query, so I can't change it. Or would it be possible to make an $a2 where "something" searches for ) and replaces them with \)?
You need to escape it. Either manually by adding backslash in front of it, or by using quotemeta or the \Q sequence inside the regex:
$a = quotemeta($a);
Or
$s =~ /\Q$a/$b/g;
ETA: This is a good option if you want to match literal strings from a database query.
You should also be aware that it is not a good idea to use $a and $b as variables, since they will mask the predefined variables that are used with sort. E.g. sort { $a <=> $b } #foo.
The simple answer is to backslash escape the paren. my $a = '5 \)'; In your case, as your post mentions, you aren't the one creating the strings, so literally escaping them isn't an option.
It may be simpler to just wrap the variable that's being interpolated by the regex inside of a \Q ... \E.
$s =~ s/\Q$a\E/$b/g;
The quotemeta() function may also be helpful to you, depending on how your code is factored. With that option you would pass $a through quotemeta before interpolating it in the regex. \Q...\E is probably easier in this situation, but if your code is simplified by using quotemeta instead, it's there for you.
Use \) instead of just ). ) is special because it's normally used for capturing patterns so you need to escape it first.
Escape the parentheses with a backslash:
my $a = '5 \)'oi;
Or use \Q inside the regexp:
$s =~ s/\Q$a/$b/g;
Also when storing regexps in a variable, you should look into the regexp quote operator: http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators
my $a = qr/5 \)/oi;
In Perl regular expression you need to mask special chars with a backslash \.
Try
my $a = '5 \)';
my $b = '567';
$s =~ s/$a/$b/g;
For details and a good start see perldoc perlretut
Update: I didn't know the RE came from a database. Well, the code above works nevertheless. The hint for the tutorial still applies.
I think you just need to escape the brackets, ie replace ) with \)

searching for parentheses in perl

Writing a program where I read in a list of words/symbols from one file and search for each one in another body of text.
So it's something like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /$find/){
push(#found, $_);
}
}
}
However, I run into trouble once parentheses show up. It gives me this error:
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE
I realize it's because Perl thinks the ( is part of the regex, but how do I deal with this and make the ( searchable?
You could use \Q and \E:
if ($_ =~ /\Q$find\E/){
Or just use index if you're just looking for a literal match:
if(index($_, $find) >= 0) {
In general backslash escapes characters inside regexes - i.e. /\(/ will match a literal (
in situations like this it's better to use the quote operator
if ( $_ =~ /\Q$find\E/ ) {
...
}
alternatively use quotemeta
You'll want to do /\Q$find\E/ instead of just /$find/ - the \Q tells the parser to stop considering metacharacters as part of the regex until it finds the \E.
I suspect you will find m/\Q$find\E/ useful - unless you want other Perl regex metacharacters to be interpreted as metacharacters.
\Q with \e will escape your special chars in the $find variable like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /\Q$find\e/){
push(#found, $_);
}
}
}

Escaping special characters in Perl regex

I'm trying to match a regular expression in Perl. My code looks like the following:
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/$pattern/) {
print "Match found!"
}
The problem arises in that brackets indicate a character class (or so I read) when Perl tries to match the regex, and the match ends up failing. I know that I can escape the brackets with \[ or \], but that would require another block of code to go through the string and search for the brackets. Is there a way to have the brackets automatically ignored without escaping them individually?
Quick note: I can't just add the backslash, as this is just an example. In my real code, $source and $pattern are both coming from outside the Perl code (either URIEncoded or from a file).
\Q will disable metacharacters until \E is found or the end of the pattern.
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/\Q$pattern/) {
print "Match found!"
}
http://www.anaesthetist.com/mnm/perl/Findex.htm
Use quotemeta():
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = quotemeta("Hello_[version]");
if ($source =~ m/$pattern/) {
print "Match found!"
}
You are using the Wrong Tool for the job.
You do not have a pattern! There are NO regex
characters in $pattern!
You have a literal string.
index() is for working with literal strings...
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ( index($source, $pattern) != -1 ) {
print "Match found!";
}
You can escape set of special characters in an expression by using the following command.
expression1 = 'text with special characters like $ % ( )';
expression1 =~s/[\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/"\\$&"/eg ;
#This will escape all the special characters
print "expression1'; # text with special characters like \$ \% \( \)

Using variables that contain special characters in Perl regexes

I'm trying to search an array for lines that contain $inbucket[0]. Some of my $inbucket[0] values include special characters. This script does exactly what I want it to, until I hit a special character.
I want the query to be case insensitive, match any part of the string $var, and process the special characters literally, as if they weren't special. Any ideas?
Thanks!
sub loopthru() {
warn "Loopthru begun on $inbucket[0]\n";
foreach $c (#chat) {
$var = $c->msg;
$lookfor2 = $inbucket[0];
if ( $var =~ /$lookfor2/i ) {
($to,$from) = split('-',$var);
$from =~ s/\.$//;
print MYFILE "$to\t$from\n";
&fillbucket($to);
&fillbucket($from);
}
}
}
You can use quotemeta, which returns the value of its argument with all non-"word" characters backslashed.
$lookfor2 = quotemeta $inbucket[0];
Or you can use the \Q escape, which is discussed in perlre. In short, it will quote (disable) pattern metacharacters until \E is encountered.
if ( $var =~ /\Q$lookfor2/i ) {
I think you are looking for
$var =~ /\Q$lookfor2/i
perl faq