Making Strings Safe for Regular Expressions in Perl - regex

I have a string which I want to use in a regular expression it a way like m/$mystring_03/ however $mystring contains +s and slashes that cause problems. Is there a simple way in Perl to modify $mystring to ensure all regular expression wildcards or other special characters are properly escaped? (like all + turned into \+)

Yes, use the \Q and \E escapes:
#!/usr/bin/perl
use strict;
use warnings;
my $text = "a+";
print
$text =~ /^$text$/ ? "matched" : "didn't match", "\n",
$text =~ /^\Q$text\E$/ ? "matched" : "didn't match", "\n";

The quotemeta function does what you are asking for.

If you are going to escape all special characters for regular expressions in the string you can just as well use rindex like
index($_, "$mystring_03")
this returns the index of the string in the string you want to test or -1 when no match is found.

Related

How to escape a regular expression when I include it in another regular expression?

I'm trying to use a string as a regular expression inside another regular expression:
my $e = "a+b";
if ('foo a+b bar' =~ /foo ${e} bar/) {
print 'match!';
}
Doesn't work, since Perl treats + as a special character. How do I escape it without changing the value of $e?
You can use \Q and \E, which treats regexp metacharacters between them as literals:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
my $e = "a+b";
if ('foo a+b bar' =~ /foo \Q$e\E bar/) {
say 'match!';
}

Perl match regex variable \Q

I'm trying to match a regex in perl. The regex needs to be stored in a variable.
From this question I got \Q to match regex in a variable.
$regex = "\\$[0-9] (\\+|\\*) [0-9]";
$str = "$2 * 2";
if ($str =~ /\Q$regex/) { # regex is: \$[0-9] (\+|\*) [0-9]
print "Expression found :)\n";
} else {
print "Expression not found :(\n";
}
This matches fine in regexpal. It also works fine when I use the regex immediately without first putting it in $regex (i.e. without the \Q). What is the \Q doing to mess up my regex?
The \Q and \E pair can be used to escape all non-word characters within a double-quoted string context. For instance
perl -E 'say "abc[\Q[..]\E]def"'
output
abc[\[\.\.\]]def
I wonder why you think you need it, as it prevents all regex metacharacters from having their special effect. For instance \Q[0-9] will match exactly [0-9] instead of any single decimal digit
I would write your code like this. Note that I have changed double quotes to qr// when defining the pattern to create a compiled regex, and to single quotes when defining the target string to avoid Perl trying to interpolate built-in variable $2 into the string. You must always use strict and use warnings 'all' at the top of every Perl program you write
use strict;
use warnings 'all';
my $regex = qr/\$[0-9] [+*] [0-9]/;
my $str = '$2 * 2';
if ( $str =~ $regex ) {
print "Expression found :)\n";
}
else {
print "Expression not found :(\n";
}
output
Expression found :)

Perl Search and Replace in one regular expression

I need in something like this: 's/oldstr/newstr/g;pattern', i.e. I want to replace oldstr and find some pattern in the string after replace. All in one regular expression.
If you want to replace first and look for a pattern in the resulting string, you can just put both expressions after one another. The replace with the /r flag will return the altered string, and the m operator will match that string against the pattern.
use Test::Simple 'no_plan'; # this is just for the ok() function
my $str = 'foobar';
ok($str =~ s/o/0/gr =~ m/\d\d/);
ok($str =~ s/o/0/gr !~ m/\d\d\d/);
__END__
ok 1
ok 2
1..2

How to match '(' using a regex?

When I do this
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $s = 'dfgdfg5 )';
my $a = '5 )';
my $b = '567';
$s =~ s/$a/$b/g;
print Dumper $s;
I get
Unmatched ) in regex; marked by <-- HERE in m/5 ) <-- HERE / at ./test.pl line 11.
The problem is that $a have a (.
How do I prevent the regex from failing?
Update
The string in $a do I get from a database query, so I can't change it. Or would it be possible to make an $a2 where "something" searches for ) and replaces them with \)?
You need to escape it. Either manually by adding backslash in front of it, or by using quotemeta or the \Q sequence inside the regex:
$a = quotemeta($a);
Or
$s =~ /\Q$a/$b/g;
ETA: This is a good option if you want to match literal strings from a database query.
You should also be aware that it is not a good idea to use $a and $b as variables, since they will mask the predefined variables that are used with sort. E.g. sort { $a <=> $b } #foo.
The simple answer is to backslash escape the paren. my $a = '5 \)'; In your case, as your post mentions, you aren't the one creating the strings, so literally escaping them isn't an option.
It may be simpler to just wrap the variable that's being interpolated by the regex inside of a \Q ... \E.
$s =~ s/\Q$a\E/$b/g;
The quotemeta() function may also be helpful to you, depending on how your code is factored. With that option you would pass $a through quotemeta before interpolating it in the regex. \Q...\E is probably easier in this situation, but if your code is simplified by using quotemeta instead, it's there for you.
Use \) instead of just ). ) is special because it's normally used for capturing patterns so you need to escape it first.
Escape the parentheses with a backslash:
my $a = '5 \)'oi;
Or use \Q inside the regexp:
$s =~ s/\Q$a/$b/g;
Also when storing regexps in a variable, you should look into the regexp quote operator: http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators
my $a = qr/5 \)/oi;
In Perl regular expression you need to mask special chars with a backslash \.
Try
my $a = '5 \)';
my $b = '567';
$s =~ s/$a/$b/g;
For details and a good start see perldoc perlretut
Update: I didn't know the RE came from a database. Well, the code above works nevertheless. The hint for the tutorial still applies.
I think you just need to escape the brackets, ie replace ) with \)

Why does smartmatch return false when I match against a regex containing slashes?

I'm trying to match a simple string against a regular expression pattern using the smartmatch operator:
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use open qw(:std :utf8);
my $name = qr{/(\w+)/};
my $line = 'string';
print "ok\n" if $line ~~ /$name/;
I expect this to print "ok", but it doesn't. Why not?
Remove the slashes from your regex:
my $name = qr{(\w+)};
Since you're wrapping the regular expression in qr{}, everything inside the braces is being interpreted as the regular expression. Therefore, if you were to expand out your search, it'd be:
print "ok\n" if $line ~~ /\/(\w+)\//;
Since your string doesn't start or end with slashes (or have any substrings that do), then the match fails, and you don't print ok.