What Perl matching regexp nuances can cause headaches? - regex

This code failed to identify any of the keys it appears to identify:
if( $key =~ /upsf|free|ground|sla|pickup|usps/ )
So I changed it to :
if( $key eq 'upsf' || $key eq 'free'
|| $key eq 'ground' || $key eq 'sla'
|| $key eq 'pickup' || $key eq 'usps' )
They look to me like they are functionally equivalent, so I'm trying to figure out why the first one failed. It's Perl under XAMPP on Windows 7, but it's also Perl under Apache2 on a Linux box.
This prints "shelf it" - both on Windows and Linux.
$key = 'upsf';
if( $key =~ /^(upsf|free|ground|sla|pickup|usps)$/ ) {
print 'ship it';
} else {
print 'shelf it';
}

They're not equiv, as the comparison operator in the first is "=~" ("contains"),
where in the second it is "eq" ("explicit match, equals").
How exactly did the first one fail? What was your test value for $key?
$key = 'xxx';
if( $key =~ /upsf|free|ground|sla|pickup|usps/ ) {
print 'ship it';
} else {
print 'shelf it';
}
will print 'shelf it'.
Whereas $key='xusps' , for example, will print 'ship it', match via '=~' operator ("contains"), which may not be your goal.

How about this one:
if ($key =~ /^(?:upsf|free|ground|sla|pickup|usps)$/) {
# ...
} else {
# ...
}

My Bad!
This code is executed by ClickCart Pro, which reads it from a file and preprocesses it like this:
$custom_script_code =~ s/\`/\'/gs;
$custom_script_code =~ s/\|\|/%7C%7C/g;
$custom_script_code =~ s/\|/ /gs;
$custom_script_code =~ s/%7C%7C/\|\|/g;
$custom_script_code =~ s/system/System/gs;
$custom_script_code =~ s/exec/Exec/gs;
$custom_script_code =~ s/die/Die/gs;
So the pipes are removed by the third statement here. Thanks Kryptronics! (sarcasm) perreal's comment has been plussed. I shouldn't get any points for this. Sorry I wasted everyone's time!

Related

Computed regular expression replacement in Perl

I want to do a global regex replace, where the replacement is conditional upon some logic that can't be put in a regular expression. For example:
while ( $var =~ /<IF OPERATOR="(.+?)" VALUE="(.+?)"\/>/g ) {
my $operator = $1;
my $value = $2;
if ( $operator eq 'true' && $hash{ $value } ) {
# replace the entire <IF/>
}
if ( $operator eq 'false' && ! $hash{ $value } ) {
# replace the entire <IF/>
}
}
How do I do the # replace bit? Thanks for any advice.
Your refusal to give any real data for this question, or to describe what changes you want, makes it very difficult to demonstrate the workability of a solution.
However it looks like you need an executable replacement, which has the /e modifier.
Take a look at this code. I have added another pair of parentheses to the regex so that the entire pattern is captured as well as the two attributes. The actual replacement string is put into $replacement and returned by the block.
$string =~ s{(<IF OPERATOR="([^"]+)" VALUE="([^"]+)"/>)}{
my $replacement = $1;
my $operator = $2;
my $value = $3;
if ( $operator eq 'true' and $hash{$value} ) {
$replacement = qq{<if state1="yes"/>};
}
elsif ( $operator eq 'false' and not $hash{$value} ) {
$replacement = qq{<if state1="no"/>};
}
$replacement;
}eg;
A lot of logic can be placed in regular expressions. For instance, there are conditional regular expressions, and you can execute Perl code in a regexp.
If you don't want to complicate the regexp, you can extract the offsets of the matches first, then splice them out with substr EXPR, OFFSET, LENGTH, ''.
But for the fun of using regexps, with named backreferences and s///e (evaluate the replacement), here is the code:
#!/usr/bin/perl -w
use strict;
my %hash = (
foo => 1,
bar => 0
);
my $var = '
<IF OPERATOR="true" VALUE="foo"/>
<IF OPERATOR="true" VALUE="bar"/>
<IF OPERATOR="false" VALUE="foo"/>
<IF OPERATOR="false" VALUE="bar"/>
';
$var =~
s`(<IF\s+OPERATOR="(?<operator>.+?)"\s+VALUE="(?<value>.+?)"/>)
`$+{operator} eq 'true' && $hash{ $+{value} } || $+{operator} eq 'false' && !$hash{ $+{value} }? 'replacement' : $1
`xeg;
print $var;
$var=$ARGV[0];
my %hash =(
a => 1,
b => 2
);
while ( $var =~ /<IF OPERATOR="(.+?)" VALUE="(.+?)"\/>/ ) {
my $operator = $1;
my $value = $2;
if ( $operator eq 'true' && $hash{ $value } ) {
$var =~ s/<IF OPERATOR="$operator" VALUE="$value"\/>/ReplacemenT/g;
}
if ( $operator eq 'false' && ! $hash{ $value } ) {
$var =~ s/<IF OPERATOR="$operator" VALUE="$value"\/>/RR/g;
}
}
print "$var\n";
on something like 'xxx<IF OPERATOR="true" VALUE="a"/>xxx' returns xxxReplacemenTxxx

Find text enclosed by # and replace the inside

The problem:
Find pieces of text in a file enclosed by # and replace the inside
Input:
#abc# abc #ABC#
cba #cba CBA#
Deisred output:
абц abc АБЦ
cba цба ЦБА
I have the following:
#!/usr/bin/perl
use strict;
use warnings;
use Encode;
my $output;
open FILE,"<", 'test.txt';
while (<FILE>) {
chomp(my #chars = split(//, $_));
for (#chars) {
my #char;
$_ =~ s/a/chr(0x430)/eg;
$_ =~ s/b/chr(0x431)/eg;
$_ =~ s/c/chr(0x446)/eg;
$_ =~ s/d/chr(0x434)/eg;
$_ =~ s/e/chr(0x435)/eg;
$_ =~ s/A/chr(0x410)/eg;
$_ =~ s/B/chr(0x411)/eg;
$_ =~ s/C/chr(0x426)/eg;
push #char, $_;
$output = join "", #char;
print encode("utf-8",$output);}
print "\n";
}
close FILE;
But I'm stuck on how to process further
Thanks for help in advance!
Kluther
Here my solution. (you will fixed it, yes. It is prototype)
for (my $data = <DATA>){
$data=~s/[#]([\s\w]+)[#]/func($1)/ge;
print $data;
# while($data=~m/[#]([\s\w]+)[#]/g){
# print "marked: ",$1,"\n";
# print "position:", pos();
# }
# print "not marked: ";
}
sub func{
#do your magic here ;)
return "<< #_ >>";
}
__DATA__
#abc# abc #ABC# cba #cba CBA#
What happens here?
First, I read data. You can do it yourself.
for (my $data = <DATA>){...}
Next, I need to search your pattern and replace it.
What should I do?
Use substition operator: s/pattern/replace/
But in interesting form:
s/pattern/func($1)/ge
Key g mean Global Search
Key e mean Evaluate
So, I think, that you need to write your own func function ;)
Maybe better to use transliteration operator: tr/listOfSymbolsToBeReplaced/listOfSymbolsThatBePlacedInstead/
With minimal changes to your algorithm you need to keep track of whether you are inside the #marks or not. so add something like this
my $bConvert = 0;
chomp(my #chars = split(//, $_));
for (#chars) {
my $char = $_;
if (/#/) {
$bConvert = ($bConvert + 1) % 2;
next;
}
elsif ($bConvert) {
$char =~ s/a/chr(0x430)/eg;
$char =~ s/b/chr(0x431)/eg;
$char =~ s/c/chr(0x446)/eg;
$char =~ s/d/chr(0x434)/eg;
$char =~ s/e/chr(0x435)/eg;
$char =~ s/A/chr(0x410)/eg;
$char =~ s/B/chr(0x411)/eg;
$char =~ s/C/chr(0x426)/eg;
}
print encode("utf-8",$char);
}
Try this after $output is processed.
$output =~ s/\#//g;
my #split_output = split(//, $output);
$output = "";
my $len = scalar(#split_output) ;
while ($len--) {
$output .= shift(#split_output);
}
print $output;
It can be done with a single regex and no splitting of the string:
use strict;
use warnings;
use Encode;
my %chars = (
a => chr(0x430),
b => chr(0x431),
c => chr(0x446),
d => chr(0x434),
e => chr(0x435),
A => chr(0x410),
B => chr(0x411),
C => chr(0x426),
);
my $regex = '(' . join ('|', keys %chars) . ')';
while (<DATA>) {
1 while ($_ =~ s|\#(?!\s)[^#]*?\K$regex(?=[^#]*(?!\s)\#)|$chars{$1}|eg);
print encode("utf-8",$_);
}
It does require repeated runs of the regex due to the overlapping nature of the matches.

Perl - Parsing Arguments/Options with REGEX

I'm creating a perl script to convert a list of commands in a template file () and output them to another file in a different format in an output file ().
The commands in the template file will look as follows:
command1 --max-size=2M --type="some value"
I'm having some problems extracting the options and values from this string. So far i have:
m/(\s--\w*=)/ig
Which will return:
" --max-size="
" --type="
However I have no idea how to return both the option and value as a separate variable or how to accommodate for the use of quotes.
Could anyone steer me in the right direction?
side note: I'm aware that Getops does an awesome job at doing this from the command-line but unfortunately these commands are passed as strings :(
Getopt::Std or Getopt::Long?
Have you looked at this option or this one?
Seems like there's no reason to reinvent the wheel.
The code below produces
#args = ('command1', '--max-size=2M', '--type=some value');
That is suitable to pass to GetOptions as follows:
local #ARGV = #args;
GetOptions(...) or die;
Finally, the code:
for ($cmd) {
my #args;
while (1) {
last if /\G \s* \z /xgc;
/\G \s* /xgc;
my $arg;
while (1) {
if (/\G ([^\\"'\s]) /xgc) {
$arg .= $1;
}
elsif (/\G \\ /xgc) {
/\G (.) /sxgc
or die "Incomplete escape";
$arg .= $1;
}
elsif (/\G (?=") /xgc) {
/\G " ( (?:[^"\\]|\\.)* ) " /sxgc
or die "Incomplete double-quoted arging";
my $quoted = $1;
$quoted =~ s/\\(.)/$1/sg;
$arg .= $quoted;
}
elsif (/\G (?=') /xgc) {
/\G ' ( [^']* ) ' /xgc
or die "Incomplete single-quoted arging";
$arg .= $1;
}
else {
last;
}
}
push #args, $arg;
}
#args
or die "Blank command";
...
}
use Data::Dumper;
$_ = 'command1 --max-size=2M a=ignore =ignore --switch --type="some value" --x= --z=1';
my %args;
while (/((?<=\s--)[a-z\d-]+)(?:="?|(?=\s))((?<![="])|(?<=")[^"]*(?=")|(?<==)(?!")\S*(?!"))"?(?=\s|$)/ig) {
$args->{$1} = $2;
}
print Dumper($args);
---
$VAR1 = {
'switch' => '',
'x' => '',
'type' => 'some value',
'z' => '1',
'max-size' => '2M'
};
(test this demo here)

How do I use perl regex to extract the digit value from '[1]'?

My code...
$option = "[1]";
if ($option =~ m/^\[\d\]$/) {print "Activated!"; $str=$1;}
I need a way to drop off the square brackets from $option. $str = $1 does not work for some reason. Please advise.
To get $1 to work you need to capture the value inside the brackets using parentheses, i.e:
if ($option =~ m/^\[(\d)\]$/) {print "Activated!"; $str=$1;}
if ($option =~ m/^\[(\d)\]$/) { print "Activated!"; $str=$1; }
Or
if (my ($str) = $option =~ m/^\[(\d)\]$/) { print "Activated!" }
Or
if (my ($str) = $option =~ /(\d)/) { print "Activated!" }
..and a bunch of others. You forgot to capture your match with ()'s.
EDIT:
if ($option =~ /(?<=^\[)\d(?=\]$)/p && (my $str = ${^MATCH})) { print "Activated!" }
Or
my $str;
if ($option =~ /^\[(\d)(?{$str = $^N})\]$/) { print "Activated!" }
Or
if ($option =~ /^\[(\d)\]$/ && ($str = $+)) { print "Activated!" }
For ${^MATCH}, $^N, and $+, perlvar.
I love these questions : )

Masking a string in perl using a mask string

I have a string such as 'xxox-x' that I want to mask each line in a file against as such:
x's are ignored (or just set to a known value)
o's remain unchanged
the - is a variable length field that will keep everything else unchanged
therefore mask 'xxox-x' against 'deadbeef' would yield 'xxaxbeex'
the same mask 'xxox-x' against 'deadabbabeef' would yield 'xxaxabbabeex'
How can I do this succinctly preferrably using s operator?
$mask =~ s/-/'o' x (length $str - length $mask)/e;
$str =~ s/(.)/substr($mask, pos $str, 1) eq 'o' ? $1 : 'x'/eg;
$ perl -pe 's/^..(.).(.+).$/xx$1x$2x/;'
deadbeef
xxaxbeex
deadabbabeef
xxaxabbabeex
Compile your pattern into a Perl sub:
sub compile {
use feature 'switch';
my($pattern) = #_;
die "illegal pattern" unless $pattern =~ /^[-xo]+$/;
my($search,$replace);
my $i = 0;
for (split //, $pattern) {
given ($_) {
when ("x") {
$search .= "."; $replace .= "x";
}
when ("o") {
$search .= "(?<sub$i>.)";
$replace .= "\$+{sub$i}";
++$i;
}
when ("-") {
$search .= "(?<sub$i>.*)";
$replace .= "\$+{sub$i}";
++$i;
}
}
}
my $code = q{
sub {
local($_) = #_;
s/^SEARCH$/REPLACE/s;
$_;
}
};
$code =~ s/SEARCH/$search/;
$code =~ s/REPLACE/$replace/;
#print $code;
local $#;
my $sub = eval $code;
die $# if $#;
$sub;
}
To be more concise, you could write
sub _patref { '$+{sub' . $_[0]++ . '}' }
sub compile {
my($pattern) = #_;
die "illegal pattern" unless $pattern =~ /^[-xo]+$/;
my %gen = (
'x' => sub { $_[1] .= '.'; $_[2] .= 'x' },
'o' => sub { $_[1] .= "(?<sub$_[0]>.)"; $_[2] .= &_patref },
'-' => sub { $_[1] .= "(?<sub$_[0]>.*)"; $_[2] .= &_patref },
);
my($i,$search,$replace) = (0,"","");
$gen{$1}->($i,$search,$replace)
while $pattern =~ /(.)/g;
eval "sub { local(\$_) = \#_; s/\\A$search\\z/$replace/; \$_ }"
or die $#;
}
Testing it:
use v5.10;
my $replace = compile "xxox-x";
my #tests = (
[ deadbeef => "xxaxbeex" ],
[ deadabbabeef => "xxaxabbabeex" ],
);
for (#tests) {
my($input,$expect) = #$_;
my $got = $replace->($input);
print "$input => $got : ", ($got eq $expect ? "PASS" : "FAIL"), "\n";
}
Output:
deadbeef => xxaxbeex : PASS
deadabbabeef => xxaxabbabeex : PASS
Note that you'll need Perl 5.10.x for given ... when.
x can be translated to . and o to (.) whereas - becomes (.+?):
#!/usr/bin/perl
use strict; use warnings;
my %s = qw( deadbeef xxaxbeex deadabbabeef xxaxabbabeex);
for my $k ( keys %s ) {
(my $x = $k) =~ s/^..(.).(.+?).\z/xx$1x$2x/;
print +($x eq $s{$k} ? 'good' : 'bad'), "\n";
}
heres a quick stab at a regex generator.. maybe somebody can refactor something pretty from it?
#!/usr/bin/perl
use strict;
use Test::Most qw( no_plan );
my $mask = 'xxox-x';
is( mask( $mask, 'deadbeef' ), 'xxaxbeex' );
is( mask( $mask, 'deadabbabeef' ), 'xxaxabbabeex' );
sub mask {
my ($mask, $string) = #_;
my $regex = $mask;
my $capture_index = 1;
my $mask_rules = {
'x' => '.',
'o' => '(.)',
'-' => '(.+)',
};
$regex =~ s/$_/$mask_rules->{$_}/g for keys %$mask_rules;
$mask =~ s/$_/$mask_rules->{$_}/g for keys %$mask_rules;
$mask =~ s/\./x/g;
$mask =~ s/\([^)]+\)/'$' . $capture_index++/eg;
eval " \$string =~ s/^$regex\$/$mask/ ";
$string;
}
Here's a character by character solution using substr rather that split. It should be efficient for long strings since it skips processing the middle part of the string (when there is a dash).
sub apply_mask {
my $mask = shift;
my $string = shift;
my ($head, $tail) = split /-/, $mask;
for( 0 .. length($head) - 1 ) {
my $m = substr $head, $_, 1;
next if $m eq 'o';
die "Bad char $m\n" if $m ne 'x';
substr($string, $_, 1) = 'x';
}
return $string unless defined $tail;
$tail = reverse $tail;
my $last_char = length($string) - 1;
for( 0 .. length($tail) - 1 ) {
my $m = substr $tail, $_, 1;
next if $m eq 'o';
die "Bad char $m\n" if $m ne 'x';
substr($string, $last_char - $_, 1) = 'x';
}
return $string;
}
sub mask {
local $_ = $_[0];
my $mask = $_[1];
$mask =~ s/-/'o' x (length($_)-(length($mask)-1))/e;
s/(.)/substr($mask, pos, 1) eq 'o' && $1/eg;
return $_;
}
Used tidbits from a couple answers ... this is what I ended up with.
EDIT: update from comments