Writing a program where I read in a list of words/symbols from one file and search for each one in another body of text.
So it's something like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /$find/){
push(#found, $_);
}
}
}
However, I run into trouble once parentheses show up. It gives me this error:
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE
I realize it's because Perl thinks the ( is part of the regex, but how do I deal with this and make the ( searchable?
You could use \Q and \E:
if ($_ =~ /\Q$find\E/){
Or just use index if you're just looking for a literal match:
if(index($_, $find) >= 0) {
In general backslash escapes characters inside regexes - i.e. /\(/ will match a literal (
in situations like this it's better to use the quote operator
if ( $_ =~ /\Q$find\E/ ) {
...
}
alternatively use quotemeta
You'll want to do /\Q$find\E/ instead of just /$find/ - the \Q tells the parser to stop considering metacharacters as part of the regex until it finds the \E.
I suspect you will find m/\Q$find\E/ useful - unless you want other Perl regex metacharacters to be interpreted as metacharacters.
\Q with \e will escape your special chars in the $find variable like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /\Q$find\e/){
push(#found, $_);
}
}
}
Related
I'm trying to match a regex in perl. The regex needs to be stored in a variable.
From this question I got \Q to match regex in a variable.
$regex = "\\$[0-9] (\\+|\\*) [0-9]";
$str = "$2 * 2";
if ($str =~ /\Q$regex/) { # regex is: \$[0-9] (\+|\*) [0-9]
print "Expression found :)\n";
} else {
print "Expression not found :(\n";
}
This matches fine in regexpal. It also works fine when I use the regex immediately without first putting it in $regex (i.e. without the \Q). What is the \Q doing to mess up my regex?
The \Q and \E pair can be used to escape all non-word characters within a double-quoted string context. For instance
perl -E 'say "abc[\Q[..]\E]def"'
output
abc[\[\.\.\]]def
I wonder why you think you need it, as it prevents all regex metacharacters from having their special effect. For instance \Q[0-9] will match exactly [0-9] instead of any single decimal digit
I would write your code like this. Note that I have changed double quotes to qr// when defining the pattern to create a compiled regex, and to single quotes when defining the target string to avoid Perl trying to interpolate built-in variable $2 into the string. You must always use strict and use warnings 'all' at the top of every Perl program you write
use strict;
use warnings 'all';
my $regex = qr/\$[0-9] [+*] [0-9]/;
my $str = '$2 * 2';
if ( $str =~ $regex ) {
print "Expression found :)\n";
}
else {
print "Expression not found :(\n";
}
output
Expression found :)
When I do this
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $s = 'dfgdfg5 )';
my $a = '5 )';
my $b = '567';
$s =~ s/$a/$b/g;
print Dumper $s;
I get
Unmatched ) in regex; marked by <-- HERE in m/5 ) <-- HERE / at ./test.pl line 11.
The problem is that $a have a (.
How do I prevent the regex from failing?
Update
The string in $a do I get from a database query, so I can't change it. Or would it be possible to make an $a2 where "something" searches for ) and replaces them with \)?
You need to escape it. Either manually by adding backslash in front of it, or by using quotemeta or the \Q sequence inside the regex:
$a = quotemeta($a);
Or
$s =~ /\Q$a/$b/g;
ETA: This is a good option if you want to match literal strings from a database query.
You should also be aware that it is not a good idea to use $a and $b as variables, since they will mask the predefined variables that are used with sort. E.g. sort { $a <=> $b } #foo.
The simple answer is to backslash escape the paren. my $a = '5 \)'; In your case, as your post mentions, you aren't the one creating the strings, so literally escaping them isn't an option.
It may be simpler to just wrap the variable that's being interpolated by the regex inside of a \Q ... \E.
$s =~ s/\Q$a\E/$b/g;
The quotemeta() function may also be helpful to you, depending on how your code is factored. With that option you would pass $a through quotemeta before interpolating it in the regex. \Q...\E is probably easier in this situation, but if your code is simplified by using quotemeta instead, it's there for you.
Use \) instead of just ). ) is special because it's normally used for capturing patterns so you need to escape it first.
Escape the parentheses with a backslash:
my $a = '5 \)'oi;
Or use \Q inside the regexp:
$s =~ s/\Q$a/$b/g;
Also when storing regexps in a variable, you should look into the regexp quote operator: http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators
my $a = qr/5 \)/oi;
In Perl regular expression you need to mask special chars with a backslash \.
Try
my $a = '5 \)';
my $b = '567';
$s =~ s/$a/$b/g;
For details and a good start see perldoc perlretut
Update: I didn't know the RE came from a database. Well, the code above works nevertheless. The hint for the tutorial still applies.
I think you just need to escape the brackets, ie replace ) with \)
I'm trying to match a regular expression in Perl. My code looks like the following:
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/$pattern/) {
print "Match found!"
}
The problem arises in that brackets indicate a character class (or so I read) when Perl tries to match the regex, and the match ends up failing. I know that I can escape the brackets with \[ or \], but that would require another block of code to go through the string and search for the brackets. Is there a way to have the brackets automatically ignored without escaping them individually?
Quick note: I can't just add the backslash, as this is just an example. In my real code, $source and $pattern are both coming from outside the Perl code (either URIEncoded or from a file).
\Q will disable metacharacters until \E is found or the end of the pattern.
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/\Q$pattern/) {
print "Match found!"
}
http://www.anaesthetist.com/mnm/perl/Findex.htm
Use quotemeta():
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = quotemeta("Hello_[version]");
if ($source =~ m/$pattern/) {
print "Match found!"
}
You are using the Wrong Tool for the job.
You do not have a pattern! There are NO regex
characters in $pattern!
You have a literal string.
index() is for working with literal strings...
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ( index($source, $pattern) != -1 ) {
print "Match found!";
}
You can escape set of special characters in an expression by using the following command.
expression1 = 'text with special characters like $ % ( )';
expression1 =~s/[\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/"\\$&"/eg ;
#This will escape all the special characters
print "expression1'; # text with special characters like \$ \% \( \)
I'm trying to search an array for lines that contain $inbucket[0]. Some of my $inbucket[0] values include special characters. This script does exactly what I want it to, until I hit a special character.
I want the query to be case insensitive, match any part of the string $var, and process the special characters literally, as if they weren't special. Any ideas?
Thanks!
sub loopthru() {
warn "Loopthru begun on $inbucket[0]\n";
foreach $c (#chat) {
$var = $c->msg;
$lookfor2 = $inbucket[0];
if ( $var =~ /$lookfor2/i ) {
($to,$from) = split('-',$var);
$from =~ s/\.$//;
print MYFILE "$to\t$from\n";
&fillbucket($to);
&fillbucket($from);
}
}
}
You can use quotemeta, which returns the value of its argument with all non-"word" characters backslashed.
$lookfor2 = quotemeta $inbucket[0];
Or you can use the \Q escape, which is discussed in perlre. In short, it will quote (disable) pattern metacharacters until \E is encountered.
if ( $var =~ /\Q$lookfor2/i ) {
I think you are looking for
$var =~ /\Q$lookfor2/i
perl faq
I'm using a Perl program to extract text from a file. I have an array of strings which I use as delimiters for the text, e.g:
$pat = $arr[1] . '(.*?)' . $arr[2];
if ( $src =~ /$pat/ ) {
print $1;
}
However, two of the strings in the array are $450 and (Buy now). The problem with these is that the symbols in the strings represent end-of-string and capture group in Perl regular expressions, so the text doesn't parse as I intend.
Is there a way around this?
Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.
quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:
$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }
or
$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]"; # \E not necessary at the end
if($src=~$pat) { print $1 }
or just
if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }
Note that this isn't limited to interpolated variables; literal characters are affected too:
perl -wle'print "\Q.+?"'
\.\+\?
though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.
Use quotemeta:
$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat)
print $1;