How to match '(' using a regex? - regex

When I do this
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $s = 'dfgdfg5 )';
my $a = '5 )';
my $b = '567';
$s =~ s/$a/$b/g;
print Dumper $s;
I get
Unmatched ) in regex; marked by <-- HERE in m/5 ) <-- HERE / at ./test.pl line 11.
The problem is that $a have a (.
How do I prevent the regex from failing?
Update
The string in $a do I get from a database query, so I can't change it. Or would it be possible to make an $a2 where "something" searches for ) and replaces them with \)?

You need to escape it. Either manually by adding backslash in front of it, or by using quotemeta or the \Q sequence inside the regex:
$a = quotemeta($a);
Or
$s =~ /\Q$a/$b/g;
ETA: This is a good option if you want to match literal strings from a database query.
You should also be aware that it is not a good idea to use $a and $b as variables, since they will mask the predefined variables that are used with sort. E.g. sort { $a <=> $b } #foo.

The simple answer is to backslash escape the paren. my $a = '5 \)'; In your case, as your post mentions, you aren't the one creating the strings, so literally escaping them isn't an option.
It may be simpler to just wrap the variable that's being interpolated by the regex inside of a \Q ... \E.
$s =~ s/\Q$a\E/$b/g;
The quotemeta() function may also be helpful to you, depending on how your code is factored. With that option you would pass $a through quotemeta before interpolating it in the regex. \Q...\E is probably easier in this situation, but if your code is simplified by using quotemeta instead, it's there for you.

Use \) instead of just ). ) is special because it's normally used for capturing patterns so you need to escape it first.

Escape the parentheses with a backslash:
my $a = '5 \)'oi;
Or use \Q inside the regexp:
$s =~ s/\Q$a/$b/g;
Also when storing regexps in a variable, you should look into the regexp quote operator: http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators
my $a = qr/5 \)/oi;

In Perl regular expression you need to mask special chars with a backslash \.
Try
my $a = '5 \)';
my $b = '567';
$s =~ s/$a/$b/g;
For details and a good start see perldoc perlretut
Update: I didn't know the RE came from a database. Well, the code above works nevertheless. The hint for the tutorial still applies.

I think you just need to escape the brackets, ie replace ) with \)

Related

Perl regex, avoid unnecessary interpolation

Consider this example,
$relPath = '..\A\B/C/D/E';
$contentsDir = '..\A\B';
$relPath =~ s/$contentsDir//;
print "$relPath\n";
#Desired output: '/C/D/E'
#Actual output: '..\A\B/C/D/E'
Please help .. this unwanted interpolation has made it impossible to compute this.
Don't mix slashes and backslashes in paths. Use just slashes.
If you want to ignore any regular expression characters in a string, place it between \Q and \E (see documentation in perlre or pass it to quotemeta.
Here's an example:
#!/usr/bin/perl -w
use strict;
my $string = 'abc.*def';
my $sub = '.*';
$string =~ s/c\Q$sub\E/d/;
# or $string = 'c' . quotemeta($sub) . 'd';
print $string; # abef
Quote special regex chars with quotemeta before matching,
$contentsDir = quotemeta '..\A\B';

How to match a question mark?

I am trying to search and replace a list of URLs in a file and I am having problems if the search URL has a question mark in it. The $file below is just a single tag here, but it is usually an entire file.
my $search = 'http://shorturl.com/detail.cfm?color=blue';
my $replace = 'http://shorturl.com/detaila.aspx?color=red';
my $file = 'HI';
$file =~ s/$search/$replace/gis;
print $file;
If the $search variable has ? in it the substitution does not work. It would work if I were to take off the ?color=blue from the $search variable.
Does anyone know how to make the above substitution work? Backslashing, i.e. \? did not help. Thanks.
Use quotemeta for the regex pattern.
use warnings;
use strict;
my $search = quotemeta 'http://shorturl.com/detail.cfm?color=blue';
my $replace = 'http://shorturl.com/detaila.aspx?color=red';
my $file = 'HI';
$file =~ s/$search/$replace/gis;
print $file;
__END__
HI
When a string is interpolated as a regex, it isn't matched literally, but interpreted as a regex. This is useful to build complex regexes, e.g.
my #animals = qw/ cat dog goldfish /;
my $animal_re = join "|", #animals;
say "The $thing is an animal" if $thing =~ /$animal_re/i;
In the string $animal_re, the | is treated as a regex metacharacter.
Other metacharacters are e.g. ., which matches any non-newline character, or ?, which makes the previous atom optional.
If you want to match the contents of a variable literally, you can enclose it in \Q...\E quotes:
s/\Q$search/$replace/gi
(The /s option just changes the meaning of . from “match any non-newline character” to “match any character”, and is therefore irrelevant here.)
The \Q...\E is syntactic sugar for the quotemeta function, therefore this answer and toolic's answer are exactly equivalent.
Please note that you want to escape more than just the ?. The ? is the only one in your example that messes up what you're expecting, but the . matching can be insidious to find.
The regex /foo.com/ will indeed match the string foo.com, but it will also match foo com and fooXcom and foo!com, because . matches any character. Therefore, the /foo.com/ should be written as /foo\.com/.

What is the difference between using $1 vs \1 in Perl regex substitutions?

I'm debugging some code and wondered if there is any practical difference between $1 and \1 in Perl regex substitutions
For example:
my $package_name = "Some::Package::ButNotThis";
$package_name =~ s{^(\w+::\w+)}{$1};
print $package_name; # Some::Package
This following line seems functionally equivalent:
$package_name =~ s{^(\w+::w+)}{\1};
Are there subtle differences between these two statements? Do they behave differently in different versions of Perl?
First, you should always use warnings when developing:
#!/usr/bin/perl
use strict; use warnings;
my $package_name = "Some::Package::ButNotThis";
$package_name =~ s{^(\w+::\w+)}{\1};
print $package_name, "\n";
Output:
\1 better written as $1 at C:\Temp\x.pl line 7.
When you get a warning you do not understand, add diagnostics:
C:\Temp> perl -Mdiagnostics x.pl
\1 better written as $1 at x.pl line 7 (#1)
(W syntax) Outside of patterns, backreferences live on as variables.
The use of backslashes is grandfathered on the right-hand side of a
substitution, but stylistically it's better to use the variable form
because other Perl programmers will expect it, and it works better if
there are more than 9 backreferences.
Why does it work better when there are more than 9 backreferences? Here is an example:
#!/usr/bin/perl
use strict; use warnings;
my $t = (my $s = '0123456789');
my $r = join '', map { "($_)" } split //, $s;
$s =~ s/^$r\z/\10/;
$t =~ s/^$r\z/$10/;
print "[$s]\n";
print "[$t]\n";
Output:
C:\Temp> x
]
[9]
If that does not clarify it, take a look at:
C:\Temp> x | xxd
0000000: 5b08 5d0d 0a5b 395d 0d0a [.]..[9]..
See also perlop:
The following escape sequences are available in constructs that interpolate and in transliterations …
\10 octal is 8 decimal. So, the replacement part contained the character code for BACKSPACE.
NB
Incidentally, your code does not do what you want: That is, it will not print Some::Package some package contrary to what your comment says because all you are doing is replacing Some::Package with Some::Package without touching ::ButNotThis.
You can either do:
($package_name) = $package_name =~ m{^(\w+::\w+)};
or
$package_name =~ s{^(\w+::\w+)(?:::\w+)*\z}{$1};
From perldoc perlre:
The bracketing construct "( ... )" creates capture buffers. To refer to
the current contents of a buffer later on, within the same pattern, use
\1 for the first, \2 for the second, and so on. Outside the match use
"$" instead of "\".
The \<digit> notation works in certain circumstances outside the match. But it can potentially clash with octal escapes. This happens when the backslash is followed by more than 1 digits.

Text replacement with backslash in a variable in Perl

How can I replace the backslash inside the variable?
$string = 'a\cc\ee';
$re = 'a\\cc';
$rep = "Work";
#doesnt work in variable
$string =~ s/$re/$rep/og;
print $string."\n";
#work with String
$string =~ s/a\\cc/$rep/og;
print $string."\n";
output:
a\cc\ee
Work\ee
Because you're using this inside of a regex -- you probably want quotemeta() or \Q and \E (see perldoc perlre)
perl -E'say quotemeta( q[a/asf$## , d] )'
# prints: a\/asf\$\#\#\ \,\ d
# Or, with `\Q`, and `\E`
$string =~ s/\Q$re\E/$rep/og;
print $string."\n";
If you set $re = 'a\cc';, it would work. The backslash is not getting interpolated as you expect when you include it in the regex as a variable: it is being used literally in the substitution.
Alternatively you could define the string with double quotes, but that's not a good practice. It's better to always use single quotes in your strings unless you explicitly want to interpolate something in the content -- it saves an infitesimal amount of processing, but it is a hint to the reader as to what you the programmer intended.
The problem is that you're using single quotes to define $re. That means that when you use it in the search pattern it looks for two slashes.
Single quotes tell Perl not to interpolate the strings, but to use the raw characters instead. Each slash is taken literally and as an escape.
Compare:
$re0 = 'a\\cc';
$re1 = "a\\cc";
When you print them out you'll see:
print $re0."\n".$re1."\n";
a\\cc
a\cc
On the other hand, when you use the string directly in the regex, it's interpolated, so you need one slash to act as an escape, and another to be what you're escaping.

How do I handle special characters in a Perl regex?

I'm using a Perl program to extract text from a file. I have an array of strings which I use as delimiters for the text, e.g:
$pat = $arr[1] . '(.*?)' . $arr[2];
if ( $src =~ /$pat/ ) {
print $1;
}
However, two of the strings in the array are $450 and (Buy now). The problem with these is that the symbols in the strings represent end-of-string and capture group in Perl regular expressions, so the text doesn't parse as I intend.
Is there a way around this?
Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.
quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:
$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }
or
$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]"; # \E not necessary at the end
if($src=~$pat) { print $1 }
or just
if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }
Note that this isn't limited to interpolated variables; literal characters are affected too:
perl -wle'print "\Q.+?"'
\.\+\?
though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.
Use quotemeta:
$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat)
print $1;