I'm trying to match a regex in perl. The regex needs to be stored in a variable.
From this question I got \Q to match regex in a variable.
$regex = "\\$[0-9] (\\+|\\*) [0-9]";
$str = "$2 * 2";
if ($str =~ /\Q$regex/) { # regex is: \$[0-9] (\+|\*) [0-9]
print "Expression found :)\n";
} else {
print "Expression not found :(\n";
}
This matches fine in regexpal. It also works fine when I use the regex immediately without first putting it in $regex (i.e. without the \Q). What is the \Q doing to mess up my regex?
The \Q and \E pair can be used to escape all non-word characters within a double-quoted string context. For instance
perl -E 'say "abc[\Q[..]\E]def"'
output
abc[\[\.\.\]]def
I wonder why you think you need it, as it prevents all regex metacharacters from having their special effect. For instance \Q[0-9] will match exactly [0-9] instead of any single decimal digit
I would write your code like this. Note that I have changed double quotes to qr// when defining the pattern to create a compiled regex, and to single quotes when defining the target string to avoid Perl trying to interpolate built-in variable $2 into the string. You must always use strict and use warnings 'all' at the top of every Perl program you write
use strict;
use warnings 'all';
my $regex = qr/\$[0-9] [+*] [0-9]/;
my $str = '$2 * 2';
if ( $str =~ $regex ) {
print "Expression found :)\n";
}
else {
print "Expression not found :(\n";
}
output
Expression found :)
Related
Given :
my $str = "foo95285734776bar";
$str =~ s/([0-9]{2,4})/_????_/g;
What single regex where '????' is the length of $1 can produce output "foo_4__4__3_bar" ?
That is, where "9528" is replaced with "_4_", "5734" with "_4_", and the remaining "776" with "_3_".
You can use the /e modifier to add Perl code into the substitution part that is then evaled.
my $str = "foo95285734776bar";
$str =~ s/([0-9]{2,4})/'_' . length($1) . '_'/ge;
print $str;
Will output
foo_4__4__3_bar
Note that you now need a full Perl expression there. That's why you have to actually quote and concatenate the underscores.
From perlop:
A /e will cause the replacement portion to be treated as a full-fledged Perl expression and evaluated right then and there. It is, however, syntax checked at compile-time. A second e modifier will cause the replacement portion to be evaled before being run as a Perl expression.
The following regular expression gives me proper results when tried in Notepad++ editor but when tried with the below perl program I get wrong results. Right answer and explanation please.
The link to file I used for testing my pattern is as follows:
(http://sainikhil.me/stackoverflow/dictionaryWords.txt)
Regular expression: ^Pre(.*)al(\s*)$
Perl program:
use strict;
use warnings;
sub print_matches {
my $pattern = "^Pre(.*)al(\s*)\$";
my $file = shift;
open my $fp, $file;
while(my $line = <$fp>) {
if($line =~ m/$pattern/) {
print $line;
}
}
}
print_matches #ARGV;
A few thoughts:
You should not escape the dollar sign
The capturing group around the whitespaces is useless
Same for the capturing group around the dot .
which leads to:
^Pre.*al\s*$
If you don't want words like precious final to match (because of the middle whitespace, change regex to:
^Pre\S*al\s*$
Included in your code:
while(my $line = <$fp>) {
if($line =~ /^Pre\S*al\s*$/m) {
print $line;
}
}
You're getting messed up by assigning the pattern to a variable before using it as a regex and putting it in a double-quoted string when you do so.
This is why you need to escape the $, because, in a double-quoted string, a bare $ indicates that you want to interpolate the value of a variable. (e.g., my $str = "foo$bar";)
The reason this is causing you a problem is because the backslash in \s is treated as escaping the s - which gives you just plain s:
$ perl -E 'say "^Pre(.*)al(\s*)\$";'
^Pre(.*)al(s*)$
As a result, when you go to execute the regex, it's looking for zero or more ses rather than zero or more whitespace characters.
The most direct fix for this would be to escape the backslash:
$ perl -E 'say "^Pre(.*)al(\\s*)\$";'
^Pre(.*)al(\s*)$
A better fix would be to use single quotes instead of double quotes and don't escape the $:
$ perl -E "say '^Pre(.*)al(\s*)$';"
^Pre(.*)al(\s*)$
The best fix would be to use the qr (quote regex) operator instead of single or double quotes, although that makes it a little less human-readable if you print it out later to verify the content of the regex (which I assume to be why you're putting it into a variable in the first place):
$ perl -E "say qr/^Pre(.*)al(\s*)$/;"
(?^u:^Pre(.*)al(\s*)$)
Or, of course, just don't put it into a variable at all and do your matching with
if($line =~ m/^Pre(.*)al(\s*)$/) ...
Try removing trailing newline character(s):
while(my $line = <$fp>) {
$line =~ s/[\r\n]+$//s;
And, to match only words that begin with Pre and end with al, try this regular expression:
/^Pre\w*al$/
(\w means any letter of a word, not just any character)
And, if you want to match both Pre and pre, do a case-insensitive match:
/^Pre\w*al$/i
Could someone explain to me why the following prints "fail"? And what the workaround is?
my $test1 = "/k?user";
my $test2 = "/k?user";
if ($test1 =~ m/$test2/) {
print "match";
}
else {
print "fail";
}
If I change $test1 and $test1 to "/k?", the match works.
Clearly it has something to do with text following the ?. But, the variables I am trying to match have question marks in them, and I would rather not have to take everything apart, match the pieces, and then reconstruct everything.
? is a special character in a regex. Use quotemeta:
my $test1 = "/k?user";
my $test2 = quotemeta "/k?user";
if ($test1 =~ m/$test2/) {
print "match";
}
else {
print "fail";
}
To (only) match
/k?user
one needs to use the pattern
^/k\?user\z
because "?" doesn't match itself in a regex pattern. You need to escape it (use "\?") for it to match a "?", and escaping the special characters (such as "?") can be done using quotemeta.
my $str = '/k?user';
my $pat = quotemeta($str);
/^$pat\z/
quotemeta can also be accessed via \Q..\E in double-quoted string literals and regex pattern literals.
my $str = '/k?user';
/^\Q$str\E\z/
(The solution previously suggested by toolic would also match "!/k?userf".)
Writing a program where I read in a list of words/symbols from one file and search for each one in another body of text.
So it's something like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /$find/){
push(#found, $_);
}
}
}
However, I run into trouble once parentheses show up. It gives me this error:
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE
I realize it's because Perl thinks the ( is part of the regex, but how do I deal with this and make the ( searchable?
You could use \Q and \E:
if ($_ =~ /\Q$find\E/){
Or just use index if you're just looking for a literal match:
if(index($_, $find) >= 0) {
In general backslash escapes characters inside regexes - i.e. /\(/ will match a literal (
in situations like this it's better to use the quote operator
if ( $_ =~ /\Q$find\E/ ) {
...
}
alternatively use quotemeta
You'll want to do /\Q$find\E/ instead of just /$find/ - the \Q tells the parser to stop considering metacharacters as part of the regex until it finds the \E.
I suspect you will find m/\Q$find\E/ useful - unless you want other Perl regex metacharacters to be interpreted as metacharacters.
\Q with \e will escape your special chars in the $find variable like:
while(<FILE>){
$findword = $_;
for (#text){
if ($_=~ /\Q$find\e/){
push(#found, $_);
}
}
}
I have a string which I want to use in a regular expression it a way like m/$mystring_03/ however $mystring contains +s and slashes that cause problems. Is there a simple way in Perl to modify $mystring to ensure all regular expression wildcards or other special characters are properly escaped? (like all + turned into \+)
Yes, use the \Q and \E escapes:
#!/usr/bin/perl
use strict;
use warnings;
my $text = "a+";
print
$text =~ /^$text$/ ? "matched" : "didn't match", "\n",
$text =~ /^\Q$text\E$/ ? "matched" : "didn't match", "\n";
The quotemeta function does what you are asking for.
If you are going to escape all special characters for regular expressions in the string you can just as well use rindex like
index($_, "$mystring_03")
this returns the index of the string in the string you want to test or -1 when no match is found.