Perl regex, avoid unnecessary interpolation - regex

Consider this example,
$relPath = '..\A\B/C/D/E';
$contentsDir = '..\A\B';
$relPath =~ s/$contentsDir//;
print "$relPath\n";
#Desired output: '/C/D/E'
#Actual output: '..\A\B/C/D/E'
Please help .. this unwanted interpolation has made it impossible to compute this.

Don't mix slashes and backslashes in paths. Use just slashes.
If you want to ignore any regular expression characters in a string, place it between \Q and \E (see documentation in perlre or pass it to quotemeta.
Here's an example:
#!/usr/bin/perl -w
use strict;
my $string = 'abc.*def';
my $sub = '.*';
$string =~ s/c\Q$sub\E/d/;
# or $string = 'c' . quotemeta($sub) . 'd';
print $string; # abef

Quote special regex chars with quotemeta before matching,
$contentsDir = quotemeta '..\A\B';

Related

Regex searching and adding characters

I'm trying to use regex to add $ to the start of words in a string such that:
Answer = partOne + partTwo
becomes
$Answer = $partOne + $partTwo
I'm using / [a-z]/ to locate them but not sure what I'm meant to replace it with.
Is there anyway to do it with regex or am I suppose to just split up my string and put in the $?
I'm using perl right now.
You can match word boundary \b, followed by word class \w
my $s = 'Answer = partOne + partTwo';
$s =~ s|\b (?= \w)|\$|xg;
print $s;
output
$Answer = $partOne + $partTwo
You could use a lookahead to match only the space or start of a line anchor which was immediately followed by an alphabet. Replace the matched space character or starting anchor with a $ symbol.
use strict;
use warnings;
while(my $line = <DATA>) {
$line =~ s/(^|\s)(?=[A-Za-z])/$1\$/g;
print $line;
}
__DATA__
Answer = partOne + partTwo
Output:
$Answer = $partOne + $partTwo
Perl's regexes have a word character class \w that is meant for exactly this sort of thing. It matches upper-case and lower-case letters, decimal digits, and the underscore _.
So if you prefix all ocurrences of one or more such characters with a dollar then it will achieve what you ask. It would look like this
use strict;
use warnings;
my $str = 'Answer = partOne + partTwo';
$str =~ s/(\w+)/\$$1/g;
print $str, "\n";
output
$Answer = $partOne + $partTwo
But please note that, if the text you're processing is a programming language, this will also process all comments and string literals in a way you probably don't want.
(\w+)
You can use this.Replace by \$$1.
See demo.
http://regex101.com/r/lS5tT3/40

Perl simple regex uppercase words separated by underscore

Consider I have string like print_this_text_in_camel_case and I want to uppercase the first word and every word after the underscore, so the result will be Print_This_Text_In_Camel_Case. The below test does not work on the first word.
#!/usr/bin/perl
my $str = "print_this_text_in_camel_case";
$str =~ s/(_.)/uc($1)/ge;
print $str, "\n";
Just modify the regex to match the first char as well:
#!/usr/bin/perl
my $str = "print_this_text_in_camel_case";
$str =~ s/(_.|^.)/uc($1)/ge;
print $str, "\n";
will print out:
Print_This_Text_In_Camel_Case
You need to add a beginning-of-string anchor as an alternative to the underscore.
For Perl 5.10+, I'd use a \K (keep) escape to emulate variable-width look-behind and only uppercase the letter. I'd also use use \U to perform the uppercase in the replacement text instead of uc and the /e (eval) modifier.
$str =~ s/(?:^|_)\K(.)/\U$1/g;
If you're using an older version of Perl (without \K) you could do it this way:
$str =~ s/(^|_)(.)/$1\U$2/g;
Another alternative is using split and join instead of a regex:
$str = join '_', map { ucfirst } split /_/, $s;
It is tidiest to use a negative look-behind. This code fragment upper-cases all letters that aren't preceded by a letter.
my $str = "print_this_text_in_camel_case";
$str =~ s/ (?<!\p{alpha}) (\p{alpha}) /uc $1/xgei;
print $str, "\n";
output
Print_This_Text_In_Camel_Case
If you prefer, or if you have a very old copy of Perl that doesn't support Unicode properties, you can use [a-z] instead od \p{alpha}, like this
$str =~ s/ (?<![a-z]) ([a-z]) /uc $1/xige;
which produces the same result.
You could also use ucfirst
use feature 'say';
my $str = "print_this_text_in_camel_case";
my #split = map(ucfirst, (split/(_)/, $str));
say #split;

Use variable as RegEx pattern

I'd like to use a variable as a RegEx pattern for matching filenames:
my $file = "test~";
my $regex1 = '^.+\Q~\E$';
my $regex2 = '^.+\\Q~\\E$';
print int($file =~ m/$regex1/)."\n";
print int($file =~ m/$regex2/)."\n";
print int($file =~ m/^.+\Q~\E$/)."\n";
The result (or on ideone.com):
0
0
1
Can anyone explain to me how I can use a variable as a RegEx pattern?
As documentation says:
$re = qr/$pattern/;
$string =~ /foo${re}bar/; # can be interpolated in other patterns
$string =~ $re; # or used standalone
$string =~ /$re/; # or this way
So, use the qr quote-like operator.
You cannot use \Q in a single-quoted / non-interpolated string. It must be seen by the lexer.
Anyway, tilde isn’t a meta-character.
Add use regex "debug" and you will see what is actually happening.

How to match '(' using a regex?

When I do this
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $s = 'dfgdfg5 )';
my $a = '5 )';
my $b = '567';
$s =~ s/$a/$b/g;
print Dumper $s;
I get
Unmatched ) in regex; marked by <-- HERE in m/5 ) <-- HERE / at ./test.pl line 11.
The problem is that $a have a (.
How do I prevent the regex from failing?
Update
The string in $a do I get from a database query, so I can't change it. Or would it be possible to make an $a2 where "something" searches for ) and replaces them with \)?
You need to escape it. Either manually by adding backslash in front of it, or by using quotemeta or the \Q sequence inside the regex:
$a = quotemeta($a);
Or
$s =~ /\Q$a/$b/g;
ETA: This is a good option if you want to match literal strings from a database query.
You should also be aware that it is not a good idea to use $a and $b as variables, since they will mask the predefined variables that are used with sort. E.g. sort { $a <=> $b } #foo.
The simple answer is to backslash escape the paren. my $a = '5 \)'; In your case, as your post mentions, you aren't the one creating the strings, so literally escaping them isn't an option.
It may be simpler to just wrap the variable that's being interpolated by the regex inside of a \Q ... \E.
$s =~ s/\Q$a\E/$b/g;
The quotemeta() function may also be helpful to you, depending on how your code is factored. With that option you would pass $a through quotemeta before interpolating it in the regex. \Q...\E is probably easier in this situation, but if your code is simplified by using quotemeta instead, it's there for you.
Use \) instead of just ). ) is special because it's normally used for capturing patterns so you need to escape it first.
Escape the parentheses with a backslash:
my $a = '5 \)'oi;
Or use \Q inside the regexp:
$s =~ s/\Q$a/$b/g;
Also when storing regexps in a variable, you should look into the regexp quote operator: http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators
my $a = qr/5 \)/oi;
In Perl regular expression you need to mask special chars with a backslash \.
Try
my $a = '5 \)';
my $b = '567';
$s =~ s/$a/$b/g;
For details and a good start see perldoc perlretut
Update: I didn't know the RE came from a database. Well, the code above works nevertheless. The hint for the tutorial still applies.
I think you just need to escape the brackets, ie replace ) with \)

How do I handle special characters in a Perl regex?

I'm using a Perl program to extract text from a file. I have an array of strings which I use as delimiters for the text, e.g:
$pat = $arr[1] . '(.*?)' . $arr[2];
if ( $src =~ /$pat/ ) {
print $1;
}
However, two of the strings in the array are $450 and (Buy now). The problem with these is that the symbols in the strings represent end-of-string and capture group in Perl regular expressions, so the text doesn't parse as I intend.
Is there a way around this?
Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.
quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:
$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }
or
$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]"; # \E not necessary at the end
if($src=~$pat) { print $1 }
or just
if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }
Note that this isn't limited to interpolated variables; literal characters are affected too:
perl -wle'print "\Q.+?"'
\.\+\?
though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.
Use quotemeta:
$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat)
print $1;