Computed regular expression replacement in Perl - regex

I want to do a global regex replace, where the replacement is conditional upon some logic that can't be put in a regular expression. For example:
while ( $var =~ /<IF OPERATOR="(.+?)" VALUE="(.+?)"\/>/g ) {
my $operator = $1;
my $value = $2;
if ( $operator eq 'true' && $hash{ $value } ) {
# replace the entire <IF/>
}
if ( $operator eq 'false' && ! $hash{ $value } ) {
# replace the entire <IF/>
}
}
How do I do the # replace bit? Thanks for any advice.

Your refusal to give any real data for this question, or to describe what changes you want, makes it very difficult to demonstrate the workability of a solution.
However it looks like you need an executable replacement, which has the /e modifier.
Take a look at this code. I have added another pair of parentheses to the regex so that the entire pattern is captured as well as the two attributes. The actual replacement string is put into $replacement and returned by the block.
$string =~ s{(<IF OPERATOR="([^"]+)" VALUE="([^"]+)"/>)}{
my $replacement = $1;
my $operator = $2;
my $value = $3;
if ( $operator eq 'true' and $hash{$value} ) {
$replacement = qq{<if state1="yes"/>};
}
elsif ( $operator eq 'false' and not $hash{$value} ) {
$replacement = qq{<if state1="no"/>};
}
$replacement;
}eg;

A lot of logic can be placed in regular expressions. For instance, there are conditional regular expressions, and you can execute Perl code in a regexp.
If you don't want to complicate the regexp, you can extract the offsets of the matches first, then splice them out with substr EXPR, OFFSET, LENGTH, ''.
But for the fun of using regexps, with named backreferences and s///e (evaluate the replacement), here is the code:
#!/usr/bin/perl -w
use strict;
my %hash = (
foo => 1,
bar => 0
);
my $var = '
<IF OPERATOR="true" VALUE="foo"/>
<IF OPERATOR="true" VALUE="bar"/>
<IF OPERATOR="false" VALUE="foo"/>
<IF OPERATOR="false" VALUE="bar"/>
';
$var =~
s`(<IF\s+OPERATOR="(?<operator>.+?)"\s+VALUE="(?<value>.+?)"/>)
`$+{operator} eq 'true' && $hash{ $+{value} } || $+{operator} eq 'false' && !$hash{ $+{value} }? 'replacement' : $1
`xeg;
print $var;

$var=$ARGV[0];
my %hash =(
a => 1,
b => 2
);
while ( $var =~ /<IF OPERATOR="(.+?)" VALUE="(.+?)"\/>/ ) {
my $operator = $1;
my $value = $2;
if ( $operator eq 'true' && $hash{ $value } ) {
$var =~ s/<IF OPERATOR="$operator" VALUE="$value"\/>/ReplacemenT/g;
}
if ( $operator eq 'false' && ! $hash{ $value } ) {
$var =~ s/<IF OPERATOR="$operator" VALUE="$value"\/>/RR/g;
}
}
print "$var\n";
on something like 'xxx<IF OPERATOR="true" VALUE="a"/>xxx' returns xxxReplacemenTxxx

Related

What Perl matching regexp nuances can cause headaches?

This code failed to identify any of the keys it appears to identify:
if( $key =~ /upsf|free|ground|sla|pickup|usps/ )
So I changed it to :
if( $key eq 'upsf' || $key eq 'free'
|| $key eq 'ground' || $key eq 'sla'
|| $key eq 'pickup' || $key eq 'usps' )
They look to me like they are functionally equivalent, so I'm trying to figure out why the first one failed. It's Perl under XAMPP on Windows 7, but it's also Perl under Apache2 on a Linux box.
This prints "shelf it" - both on Windows and Linux.
$key = 'upsf';
if( $key =~ /^(upsf|free|ground|sla|pickup|usps)$/ ) {
print 'ship it';
} else {
print 'shelf it';
}
They're not equiv, as the comparison operator in the first is "=~" ("contains"),
where in the second it is "eq" ("explicit match, equals").
How exactly did the first one fail? What was your test value for $key?
$key = 'xxx';
if( $key =~ /upsf|free|ground|sla|pickup|usps/ ) {
print 'ship it';
} else {
print 'shelf it';
}
will print 'shelf it'.
Whereas $key='xusps' , for example, will print 'ship it', match via '=~' operator ("contains"), which may not be your goal.
How about this one:
if ($key =~ /^(?:upsf|free|ground|sla|pickup|usps)$/) {
# ...
} else {
# ...
}
My Bad!
This code is executed by ClickCart Pro, which reads it from a file and preprocesses it like this:
$custom_script_code =~ s/\`/\'/gs;
$custom_script_code =~ s/\|\|/%7C%7C/g;
$custom_script_code =~ s/\|/ /gs;
$custom_script_code =~ s/%7C%7C/\|\|/g;
$custom_script_code =~ s/system/System/gs;
$custom_script_code =~ s/exec/Exec/gs;
$custom_script_code =~ s/die/Die/gs;
So the pipes are removed by the third statement here. Thanks Kryptronics! (sarcasm) perreal's comment has been plussed. I shouldn't get any points for this. Sorry I wasted everyone's time!

Replace only up to N matches on a line

In Perl, how to write a regular expression that replaces only up to N matches per string?
I.e., I'm looking for a middle ground between s/aa/bb/; and s/aa/bb/g;. I want to allow multiple substitutions, but only up to N times.
I can think of three reliable ways. The first is to replace everything after the Nth match with itself.
my $max = 5;
$s =~ s/(aa)/ $max-- > 0 ? 'bb' : $1 /eg;
That's not very efficient if there are far more than N matches. For that, we need to move the loop out of the regex engine. The next two methods are ways of doing that.
my $max = 5;
my $out = '';
$out .= $1 . 'bb' while $max-- && $in =~ /\G(.*?)aa/gcs;
$out .= $1 if $in =~ /\G(.*)/gcs;
And this time, in-place:
my $max = 5;
my $replace = 'bb';
while ($max-- && $s =~ s/\G.*?\Kaa/$replace/s) {
pos($s) = $-[0] + length($replace);
}
You might be tempted to do something like
my $max = 5;
$s =~ s/aa/bb/ for 1..$max;
but that approach will fail for other patterns and/or replacement expressions.
my $max = 5;
$s =~ s/aa/ba/ for 1..$max; # XXX Turns 'aaaaaaaa'
# into 'bbbbbaaa'
# instead of 'babababa'
And of course, starting from the beginning of the string every time could be expensive.
What you want is not posible in regular expressions. But you can put the replacement in a for-loop:
my $i;
my $aa = 'aaaaaaaaaaaaaaaaaaaa';
for ($i=0;$i<4;$i++) {
$aa =~ s/aa/bb/;
}
print "$aa\n";
result:
bbbbbbbbaaaaaaaaaaaa
You can use the /e flag which evaluates the right side as an expression:
my $n = 3;
$string =~ s/(aa)/$n-- > 0 ? "bb" : $1/ge;
Here's a solution using the /e modifier, with which you can use
perl code to generate the replacement string:
my $count = 0;
$string =~ s{ $pattern }
{
$count++;
if ($count < $limit ) {
$replace;
} else {
$&; # faking a no-op, replacing with the original match.
}
}xeg;
With perl 5.10 or later you can drop the $& (which has weird
performance complications) and use ${^MATCH} via the /p modifier
$string =~ s{ $pattern }
{
$count++;
if ($count < $limit ) {
$replace;
} else {
${^MATCH};
}
}xegp;
It's too bad you can't just do this, but you can't:
last if $count >= $limit;

How to refer regular expression matches evaluated in one statement?

I want to do something like this
if(($Fifo[5]=~/T0int(\S+)/)&&($Fifo[6]=~/T0int(\S+)/)&&($1 ne $2))
{
<Do something>
}
How can I reference matches evaluated in two regexps ?
By $1 I meant match evaluated in the first regexp and $2 in the next.
my($first) = $Fifo[5] =~ /T0int(\S+)/;
my($second) = $Fifo[6] =~ /T0int(\S+)/;
if (defined($first) && defined($second) && $first ne $second)) { ⋯ }
or more cavalierly:
if (($Fifo[5] =~ /T0int(\S+)/)[0] ne ($Fifo[6] =~ /T0int(\S+)/)[0]) { ⋯ }
or even more cavalierly still:
if ( (my($first, $second) = "#Fifo[5,6]" =~ /T0int(\S+)/g )
&& $first && $second
&& $first ne $second)
{
⋯
}
Try something like this:
if( ($Fifo[5] =~ (/T0int(\S+)/)) && ($Fifo[6] =~ (/T0int(\S+)/)) && ($1 ne $2) )
Basically put parenthesis around regex to group them as $1, $2

How do I use perl regex to extract the digit value from '[1]'?

My code...
$option = "[1]";
if ($option =~ m/^\[\d\]$/) {print "Activated!"; $str=$1;}
I need a way to drop off the square brackets from $option. $str = $1 does not work for some reason. Please advise.
To get $1 to work you need to capture the value inside the brackets using parentheses, i.e:
if ($option =~ m/^\[(\d)\]$/) {print "Activated!"; $str=$1;}
if ($option =~ m/^\[(\d)\]$/) { print "Activated!"; $str=$1; }
Or
if (my ($str) = $option =~ m/^\[(\d)\]$/) { print "Activated!" }
Or
if (my ($str) = $option =~ /(\d)/) { print "Activated!" }
..and a bunch of others. You forgot to capture your match with ()'s.
EDIT:
if ($option =~ /(?<=^\[)\d(?=\]$)/p && (my $str = ${^MATCH})) { print "Activated!" }
Or
my $str;
if ($option =~ /^\[(\d)(?{$str = $^N})\]$/) { print "Activated!" }
Or
if ($option =~ /^\[(\d)\]$/ && ($str = $+)) { print "Activated!" }
For ${^MATCH}, $^N, and $+, perlvar.
I love these questions : )

Masking a string in perl using a mask string

I have a string such as 'xxox-x' that I want to mask each line in a file against as such:
x's are ignored (or just set to a known value)
o's remain unchanged
the - is a variable length field that will keep everything else unchanged
therefore mask 'xxox-x' against 'deadbeef' would yield 'xxaxbeex'
the same mask 'xxox-x' against 'deadabbabeef' would yield 'xxaxabbabeex'
How can I do this succinctly preferrably using s operator?
$mask =~ s/-/'o' x (length $str - length $mask)/e;
$str =~ s/(.)/substr($mask, pos $str, 1) eq 'o' ? $1 : 'x'/eg;
$ perl -pe 's/^..(.).(.+).$/xx$1x$2x/;'
deadbeef
xxaxbeex
deadabbabeef
xxaxabbabeex
Compile your pattern into a Perl sub:
sub compile {
use feature 'switch';
my($pattern) = #_;
die "illegal pattern" unless $pattern =~ /^[-xo]+$/;
my($search,$replace);
my $i = 0;
for (split //, $pattern) {
given ($_) {
when ("x") {
$search .= "."; $replace .= "x";
}
when ("o") {
$search .= "(?<sub$i>.)";
$replace .= "\$+{sub$i}";
++$i;
}
when ("-") {
$search .= "(?<sub$i>.*)";
$replace .= "\$+{sub$i}";
++$i;
}
}
}
my $code = q{
sub {
local($_) = #_;
s/^SEARCH$/REPLACE/s;
$_;
}
};
$code =~ s/SEARCH/$search/;
$code =~ s/REPLACE/$replace/;
#print $code;
local $#;
my $sub = eval $code;
die $# if $#;
$sub;
}
To be more concise, you could write
sub _patref { '$+{sub' . $_[0]++ . '}' }
sub compile {
my($pattern) = #_;
die "illegal pattern" unless $pattern =~ /^[-xo]+$/;
my %gen = (
'x' => sub { $_[1] .= '.'; $_[2] .= 'x' },
'o' => sub { $_[1] .= "(?<sub$_[0]>.)"; $_[2] .= &_patref },
'-' => sub { $_[1] .= "(?<sub$_[0]>.*)"; $_[2] .= &_patref },
);
my($i,$search,$replace) = (0,"","");
$gen{$1}->($i,$search,$replace)
while $pattern =~ /(.)/g;
eval "sub { local(\$_) = \#_; s/\\A$search\\z/$replace/; \$_ }"
or die $#;
}
Testing it:
use v5.10;
my $replace = compile "xxox-x";
my #tests = (
[ deadbeef => "xxaxbeex" ],
[ deadabbabeef => "xxaxabbabeex" ],
);
for (#tests) {
my($input,$expect) = #$_;
my $got = $replace->($input);
print "$input => $got : ", ($got eq $expect ? "PASS" : "FAIL"), "\n";
}
Output:
deadbeef => xxaxbeex : PASS
deadabbabeef => xxaxabbabeex : PASS
Note that you'll need Perl 5.10.x for given ... when.
x can be translated to . and o to (.) whereas - becomes (.+?):
#!/usr/bin/perl
use strict; use warnings;
my %s = qw( deadbeef xxaxbeex deadabbabeef xxaxabbabeex);
for my $k ( keys %s ) {
(my $x = $k) =~ s/^..(.).(.+?).\z/xx$1x$2x/;
print +($x eq $s{$k} ? 'good' : 'bad'), "\n";
}
heres a quick stab at a regex generator.. maybe somebody can refactor something pretty from it?
#!/usr/bin/perl
use strict;
use Test::Most qw( no_plan );
my $mask = 'xxox-x';
is( mask( $mask, 'deadbeef' ), 'xxaxbeex' );
is( mask( $mask, 'deadabbabeef' ), 'xxaxabbabeex' );
sub mask {
my ($mask, $string) = #_;
my $regex = $mask;
my $capture_index = 1;
my $mask_rules = {
'x' => '.',
'o' => '(.)',
'-' => '(.+)',
};
$regex =~ s/$_/$mask_rules->{$_}/g for keys %$mask_rules;
$mask =~ s/$_/$mask_rules->{$_}/g for keys %$mask_rules;
$mask =~ s/\./x/g;
$mask =~ s/\([^)]+\)/'$' . $capture_index++/eg;
eval " \$string =~ s/^$regex\$/$mask/ ";
$string;
}
Here's a character by character solution using substr rather that split. It should be efficient for long strings since it skips processing the middle part of the string (when there is a dash).
sub apply_mask {
my $mask = shift;
my $string = shift;
my ($head, $tail) = split /-/, $mask;
for( 0 .. length($head) - 1 ) {
my $m = substr $head, $_, 1;
next if $m eq 'o';
die "Bad char $m\n" if $m ne 'x';
substr($string, $_, 1) = 'x';
}
return $string unless defined $tail;
$tail = reverse $tail;
my $last_char = length($string) - 1;
for( 0 .. length($tail) - 1 ) {
my $m = substr $tail, $_, 1;
next if $m eq 'o';
die "Bad char $m\n" if $m ne 'x';
substr($string, $last_char - $_, 1) = 'x';
}
return $string;
}
sub mask {
local $_ = $_[0];
my $mask = $_[1];
$mask =~ s/-/'o' x (length($_)-(length($mask)-1))/e;
s/(.)/substr($mask, pos, 1) eq 'o' && $1/eg;
return $_;
}
Used tidbits from a couple answers ... this is what I ended up with.
EDIT: update from comments