perl: capturing the replaced-with string - regex

I have code in a loop similar to
for( my $i=0; $a =~ s/<tag>(.*?)<\/tag>/sprintf("&CITE%03d;",$i)/e ; $i++ ){
%cite{ $i } = $1;
}
but instead of just the integer index, I want to make the keys of the hash the actual replaced-with text (placeholder "&CITE001;", etc.) without having to redo the sprintf().
I was almost sure there was a way to do it (variable similar to $& and such, but maybe I was thinking of vim's substitutions and not perl. :)
Thanks!

my $i = 0;
s{<tag>(.*?)</tag>}{
my $entity = sprintf("&CITE%03d;", $i++);
$cite{$entity} = $1;
$entity
}eg;

I did a something of a hacque, but really wanted something a bit more elegant. What I ended up doing (for now) is
my $t;
for( my $i=0; $t = sprintf("&CITE%04d;",$i), $all =~ s/($oct.*?$cct)/$t/s; $i++ ){
$cites{$t} = $1;
}
but I really wanted something even more "self-contained".
Just being able to grab the replacement string would've made things much simpler, though. This is a simple read-modify-write op.
True, adding the 'g' modifier should help shave some microseconds off it. :D

I think any method other than re-starting the search from the start of the target
is always the better choice.
In that vein and, as an alternative, you can move the logic inside the regex
via a Code Construct (?{ code }) and leverage the fact that $^N contains
the last capture content.
Perl
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
my $target = "<tag>zero</tag>\n<tag>one</tag>\n<tag>two</tag>\n<tag>three</tag>";
my %cite;
my ($cnt,$key) = (0,'');
$target =~ s/
<tag> (.*?) <\/tag>
(?{
$key = sprintf("&CITE%03d;", $cnt++);
$cite{$key} = $^N;
})
/$key/xg;
print $target, "\n";
print Dumper(\%cite);
Output
&CITE000;
&CITE001;
&CITE002;
&CITE003;
$VAR1 = {
'&CITE000;' => 'zero',
'&CITE001;' => 'one',
'&CITE002;' => 'two',
'&CITE003;' => 'three'
};
Edited/code by #Ikegami
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
sub f {
my $target = "<tag>zero</tag>\n<tag>one</tag>\n<tag>two</tag>\n<tag>three</tag>";
my %cite;
my ($cnt,$key) = (0,'');
$target =~ s/
<tag> (.*?) <\/tag>
(?{
$key = sprintf("&CITE%03d;", $cnt++);
$cite{$key} = $^N;
})
/$key/xg;
print $target, "\n";
print Dumper(\%cite);
}
f() for 1..2;
Output
Variable "$key" will not stay shared at (re_eval 1) line 2.
Variable "$cnt" will not stay shared at (re_eval 1) line 2.
Variable "%cite" will not stay shared at (re_eval 1) line 3.
&CITE000;
&CITE001;
&CITE002;
&CITE003;
$VAR1 = {
'&CITE000;' => 'zero',
'&CITE001;' => 'one',
'&CITE002;' => 'two',
'&CITE003;' => 'three'
};
$VAR1 = {};
This issue has been addressed in 5.18.
Perl by #sln
See, now I don't get that issue in version 5.20.
And, I don't believe I got it in 5.12 either.
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
sub wrapper {
my ($targ, $href) = #_;
my ($cnt, $key) = (0,'');
$$targ =~ s/<tag>(.*?)<\/tag>(?{ $key = sprintf("&CITE%03d;", $cnt++); $href->{$key} = $^N; })/$key/g;
}
my ($target,%cite) = ("<tag>zero</tag>\n<tag>one</tag>\n<tag>two</tag>\n<tag>three</tag>", ());
wrapper( \$target, \%cite );
print $target, "\n";
print Dumper(\%cite);
($target,%cite) = ("<tag>zero</tag>\n<tag>one</tag>\n<tag>two</tag>\n<tag>three</tag>", ());
wrapper( \$target, \%cite );
print $target, "\n";
print Dumper(\%cite);
Output
&CITE000;
&CITE001;
&CITE002;
&CITE003;
$VAR1 = {
'&CITE000;' => 'zero',
'&CITE001;' => 'one',
'&CITE002;' => 'two',
'&CITE003;' => 'three'
};
&CITE000;
&CITE001;
&CITE002;
&CITE003;
$VAR1 = {
'&CITE000;' => 'zero',
'&CITE001;' => 'one',
'&CITE002;' => 'two',
'&CITE003;' => 'three'
};

Related

Get all occurances of matches pattern perl

if we are in the following case:
my $str = <<EO_STR;
Name=Value1 Adress=Value4
Name=Value2 Adress=Value5
Name=Value3 Adress=Value6
EO_STR
I have a table "T1" in the database with columns: ("Name", "Address") and I want to put on the column "Name" values "value1,Value2,Value3" and on the column "Adress" values "Value4,Value5,Value6"
in this case we have :
my #matches = $str =~ /Name=(.*?)\nAdress=(.*?)\n/g;
how can we use $1 and $2 with #matches in order to get separately all occurence of Name and Adresse in order to insert them on the Table T1?
All captures of all matches are returned, so you'd have to group them up.
use List::Util 1.29 qw( pairs );
for ( pairs( $str =~ /Name=(.*) Address=(.*)/g ) ) {
my #matches = #$_;
...
}
That said, it's far more common to grab the matches iteratively.
while ($str =~ /Name=(.*) Address=(.*)/g) {
my #matches = ( $1, $2 );
...
}
Regex is not always the right tool for the job. Your data looks a lot like it's just key/value pairs. Use split to break it up. No need for a pattern match here.
Your code and data doesn't match, so I've gone with what the code said.
use strict;
use warnings;
my $str = <<EO_STR;
Name=Value1
Adress=Value4
Name=Value2
Adress=Value5
Name=Value3
Adress=Value6
EO_STR
my $fields;
foreach my $pair (split /\n/, $str) {
my ($key, $value) = split /=/, $pair;
$key =~ s/^\s+//;
push #{ $fields->{$key} }, $value;
}
use Data::Dumper;
print Dumper $fields;
The code will create this data structure:
$VAR1 = {
'Name' => [
'Value1',
'Value2',
'Value3'
],
'Adress' => [
'Value4',
'Value5',
'Value6'
]
};
You can now access these two array references and use them to insert data into your table.
I have done the following:
#!/usr/bin/env perl
use v5.28;
my $str = <<EO_STR;
Name=Value1 Adress=Value4
Name=Value2 Adress=Value5
Name=Value3 Adress=Value6
EO_STR
my #array;
for my $a (split(/\n/, $str)) {
my %res = $a =~ m/(\w+)=(\w+)/g;
push #array, \%res;
}
for my $a (#array) {
for my $b (sort keys %{$a}) {
"\n", <INPUT_FILE> ); say $b.'->'.$a->{$b};
}
}
It creates this structure:
#array = [
{
Name->Value1,
Adress->Value4
},
...
];

perl: refer to groups in foreach loop

Can I refer to groups in a foreach loop? my code below works:
#SEP_line=grep(/,\s(SEP[A-F0-9]*),\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}),\s(\+\d+),\s.*,\s(\w+)\n$/, #lines);
foreach (#SEP_line)
{
/,\s(SEP[A-F0-9]*),\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}),\s(\+\d+),\s.*,\s(\w+)\n$/;
print $1.",".$2.",".$3.",".$4."\n";
}
Since I already specified the match regex in #SEP_line definition, I'd do something like this:
#SEP_line=grep(/,\s(SEP[A-F0-9]*),\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}),\s(\+\d+),\s.*,\s(\w+)\n$/, #lines);
foreach (#SEP_line)
{
print #SEP_line[$1].",".#SEP_line[$2].",".#SEP_line[$3]."\n";
}
This doesnt work. Thanks in advance
Why loop twice (grep, foreach)? You can do it all in one loop:
foreach my $line (#lines)
{
if ($line =~ /,\s(SEP[A-F0-9]*),\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}),\s(\+\d+),\s.*,\s(\w+)\n$/)
{
print "$1,$2,$3,$4\n";
}
}
Don't hand-parse and hand-generate CSV! It's simpler and cleaner to use a CSV parser.
use Text::CSV_XS qw( );
my $fh_in = ...;
my $fh_out = \*STDOUT;
my $csv_in = Text::CSV_XS->new({ binary => 1, auto_diag => 2, sep => ', ' });
my $csv_out = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });
while ( my $row = $csv_in->getline($fh_in) ) {
$csv_out->say($fh_out, [ #$row[-4,-3,-2,-1] ]
if $row->[-4] =~ /^SEP[A-F0-9]*\z/
&& $row->[-3] =~ /^\d{1,3}(?:\.\d{1,3}){3}\z/
&& $row->[-2] =~ /^\+\d+\z/
&& $row->[-1] =~ /^\w+\z/;
}

Dynamically capture regular expression match in Perl

I'm trying to dynamically catch regex matching in Perl. I've known that eval will help me do this but I may be doing something wrong.
Code:
use strict;
use warnings;
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
my $str = '1/12/2016';
foreach my $pattern (keys (%testHash)) {
my $value = $testHash{$pattern};
my $result;
eval {
local $_ = $str;
/$pattern/;
print "\$1 - $1\n";
print "\$2 - $2\n";
print "\$3 - $3\n";
eval { print "$value\n"; }
}
}
Is it also possible to store captured regex patterns in an array?
I believe what you really want is a dynamic version of the following:
say $str =~ s/(\d+)\/(\d+)\/(\d+)/$1$2$3/gr;
String::Substitution provides what we need to achieve that.
use String::Substitution qw( gsub_copy );
for my $pattern (keys(%testHash)) {
my $replacement = $testHash{$pattern};
say gsub_copy($str, $pattern, $replacement);
}
Note that $replacement can also be a callback. This permits far more complicated substitutions. For example, if you wanted to convert 1/12/2016 into 2016-01-12, you could use the following:
'(\d+)/(\d+)/(\d+)' => sub { sprintf "%d-%02d-%02d", #_[3,1,2] },
To answer your actual question:
use String::Substitution qw( interpolate_match_vars last_match_vars );
for my $pattern (keys(%testHash)) {
my $template = $testHash{$pattern};
$str =~ $pattern # Or /$pattern/ if you prefer
or die("No match!\n");
say interpolate_match_vars($template, last_match_vars());
}
I am not completely sure what you want to do here, but I don't think your program does what you think it does.
You are useing eval with a BLOCK of code. That's like a try block. If it dies inside of that eval block, it will catch that error. It will not run your string like it was code. You need a string eval for that.
Instead of explaining that, here's an alternative.
This program uses sprintf and numbers the parameters. The %1$s syntax in the pattern says _take the first argument (1$) and format it as a string (%s). You don't need to localize or assign to $_ to do a match. The =~ operator does that on other variables for you. I also use qr{} to create a quoted regular expression (essentially a variable containing a precompiled pattern) that I can use directly. Because of the {} as delimiter, I don't need to escape the slashes.
use strict;
use warnings;
use feature 'say'; # like print ..., "\n"
my %testHash = (
qr{(\d+)/(\d+)/(\d+)} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d+) nomatch} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d\d\d\d)} => '%3$4d-%2$02d-%1$02d',
qr{\d} => '%s', # no capture group
);
my $str = '1/12/2016';
foreach my $pattern ( keys %testHash ) {
my #captures = ( $str =~ $pattern );
say "pattern: $pattern";
if ($#+ == 0) {
say " no capture groups";
next;
}
unless (#captures) {
say " no match";
next;
}
# debug-output
for my $i ( 1 .. $#- ) {
say sprintf " \$%d - %s", $i, $captures[ $i - 1 ];
}
say sprintf $testHash{$pattern}, #captures;
}
I included four examples:
The first pattern is the one you had. It uses %1$s and so on as explained above.
The second one does not match. We check the number of elements in #captured by looking at it in scalar context.
The third one shows that you can also reorder the result, or even use the sprintf formatting.
The last one has no capture group. We check by looking at the index of the last element ($# as the sigil for arrays that usually have an # sigil) in #+, which holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. The first element is the end of the overall match, so if this only has one element, we don't have capture groups.
The output for me is this:
pattern: (?^:(\d+)/(\d+)/(\d\d\d\d))
$1 - 1
$2 - 12
$3 - 2016
2016-12-01
pattern: (?^:(\d+)/(\d+)/(\d+) nomatch)
no match
pattern: (?^:\d)
no capture groups
pattern: (?^:(\d+)/(\d+)/(\d+))
$1 - 1
$2 - 12
$3 - 2016
1.12.2016
Note that the order in the output is mixed up. That's because hashes are not ordered in Perl, and if you iterate over the keys in a hash without sort the order is random.
Apologies! I realized both my question and sample code were both vague. But after reading your suggestions I came of with the following code.
I haven't optimized this code yet and there is a limit to the replacement.
foreach my $key (keys %testHash) {
if ( $str =~ $key ) {
my #matchArr = ($str =~ $key); # Capture all matches
# Search and replace (limited from $1 to $9)
for ( my $i = 0; $i < #matchArr; $i++ ) {
my $num = $i+1;
$testHash{$key} =~ s/\$$num/$matchArr[$i]/;
}
$result = $testHash{$key};
last;
}
}
print "$result\n";
Evaluing the regexp in list context returns the matches. so in your example:
use Data::Dumper; # so we can see the result
foreach my $pattern (keys (%testHash)) {
my #a = ($str =~/$pattern/);
print Dumper(\#a);
}
would do the job.
HTH
Georg
Is it also possible to store captured regex patterns in an array?
Of course it is possible to store captured substrings in an array:
#!/usr/bin/env perl
use strict;
use warnings;
my #patterns = map qr{$_}, qw{
(\d+)/(\d+)/(\d+)
};
my $str = '1/12/2016';
foreach my $pattern ( #patterns ) {
my #captured = ($str =~ $pattern)
or next;
print "'$_'\n" for #captured;
}
Output:
'1'
'12'
'2016'
I do not quite understand what you are trying to do with combinations of local, eval EXPR and eval BLOCK in your code and the purpose of the following hash:
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
If you are trying to codify that this pattern should result in three captures, you can do that like this:
my #tests = (
{
pattern => qr{(\d+)/(\d+)/(\d+)},
ncaptures => 3,
}
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern})
or next;
unless (#captured == $test->{ncaptures}) {
# handle failure
}
}
See this answer to find out how you can automate counting the number of capture groups in a pattern. Using the technique in that answer:
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
my #tests = map +{ pattern => qr{$_}, ncaptures => number_of_capturing_groups($_) }, qw(
(\d+)/(\d+)/(\d+)
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern});
ok #captured == $test->{ncaptures};
}
done_testing;
sub number_of_capturing_groups {
"" =~ /|$_[0]/;
return $#+;
}
Output:
ok 1
1..1

How to change match modifier within a program in perl? [duplicate]

Is there a way to use a variable as modifier in a substitution?
my $search = 'looking';
my $replace = '"find: $1 ="';
my $modifier = 'ee';
s/$search/$replace/$modifier;
I need to use an array of hashes to make bulk search-replace with different modifiers.
While the method using eval to compile a new substitution is probably the most straightforward, you can create a substitution that is more modular:
use warnings;
use strict;
sub subst {
my ($search, $replace, $mod) = #_;
if (my $eval = $mod =~ s/e//g) {
$replace = qq{'$replace'};
$replace = "eval($replace)" for 1 .. $eval;
} else {
$replace = qq{"$replace"};
}
sub {s/(?$mod)$search/$replace/ee}
}
my $sub = subst '(abc)', 'uc $1', 'ise';
local $_ = "my Abc string";
$sub->();
print "$_\n"; # prints "my ABC string"
This is only lightly tested, and it is left as an exercise for the reader to implement other flags like g
You could use eval, if you put on your safety goggles and your divide-by-zero suit.
E.g.:
use strict;
use warnings;
sub mk_re {
my ($search, $replace, $modifier) = #_;
$modifier ||= '';
die "Bad modifier $modifier" unless $modifier =~ /^[msixge]*$/;
my $sub = eval "sub { s/($search)/$replace/$modifier; }";
die "Error making regex for [$search][$replace][$modifier]: $#" unless $sub;
return $sub;
}
my $search = 'looking';
my $replace = '"find: $1 ="';
my $modifier = 'e';
# Sub can be stored in an array or hash
my $sub = mk_re($search, $replace, $modifier);
$_ = "abc-looking-def";
print "$_\n";
$sub->();
print "$_\n";
Hm, if I had to do it I would do like this:
use warnings;
use strict;
my #stuff = (
{
search => "this",
replace => "that",
modifier => "g",
},
{
search => "ono",
replace => "wendy",
modifier => "i",
}
);
$_ = "this ono boo this\n";
for my $h (#stuff) {
if ($h->{modifier} eq 'g') {
s/$h->{search}/$h->{replace}/g;
} elsif ($h->{modifier} eq 'i') {
s/$h->{search}/$h->{replace}/i;
}
# etc.
}
print;
There are only so many different modifiers you might want to use so I think this is easy enough.
You can use eval for this, but it's awfully messy.
Of course s/$search/$replace/ work as you expect. It is the dynamic modifiers that are not straightforward.
For the regular match modifiers of pimsx you can use Perl's Extended Patterns to modify the modifier flags on the fly as part of your pattern. These are of the form (?pimsx-imsx) to turn on / off those modifiers.
For the s// e and ee forms, you can use (?{ perl code}) documented in the same perlre section. For all of eval e or ee forms, consider the security of the resulting code!
There is no form to modify global to first match that I am aware of, so global vs first match would need to be separate statements.
Here's a combination of Kinopiko's answer and eval.
eval is used here to generate the lookup table in a controlled and maintainable fashion, and a lookup table is used to save all the if.. elsif.. elsif which are not too fun to look at.
(very lightly tested)
my #stuff = (
{
search => "this",
replace => "that",
modifier => "g",
},
{
search => "ono",
replace => "wendy",
modifier => "i",
}
);
$_ = "this ono boo this\n";
my #modifiers = qw{m s i x g e};
my $s_lookup = {};
foreach my $modifier (#modifiers) {
$s_lookup->{$modifier} = eval " sub { s/\$_[0]/\$_[1]/$modifier } ";
}
for my $h (#stuff) {
$s_lookup->{$h->{modifier}}->($h->{search},$h->{replace});
}
print;
To be fully useful this needs:
combinations of possible modifiers
sort function on the lookup table so 'msi' combination and 'mis' combination will go to the same key.

Perl: Using split but ignore quotes

I'm trying to create a Perl hash from an input string, but I'm having problems with the original 'split', as values may contain quotes. Below is an example input string, and my (desired) resulting hash:
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,MOB,123,KEY,VALUE":TIME,"08:01:59":FIN,0';
my %hash =
(
CREATE => '',
USER => '',
TEL => '12345678',
MOB => '444001122',
Type => 'Whatever',
ATTRIBUTES => 'ID,0,MOB,123,KEY,VALUE',
TIME => '08:01:59',
FIN => '0',
);
The input string is of arbitrary length, and the number of keys is not set.
Thanks!
-hq
Use Text::CSV. It handles comma separated value files correctly.
Update
It seems the format of your input is not parsable by the standard module, even with sep_char and allow_loose_quotes. So, you have to do the heavy lifting yourself, but you can still use Text::CSV to parse each key-value pair:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw(say);
use Data::Dumper;
use Text::CSV;
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';
my #fields = split /:/, $command;
my %hash;
my $csv = Text::CSV->new();
my $i = 0;
while ($i <= $#fields) {
if (1 == $fields[$i] =~ y/"//) {
my $j = $i;
$fields[$i] .= ':' . $fields[$j] until 1 == $fields[++$j] =~ y/"//;
$fields[$i] .= ':' . $fields[$j];
splice #fields, $i + 1, $j - $i, ();
}
$csv->parse($fields[$i]);
my ($key, $value) = $csv->fields;
$hash{$key} = "$value"; # quotes turn undef to q()
$i++;
}
print Dumper \%hash;
As far as I can see the most obvious candidate - Text::CSV - won't handle this format properly, so a home-grown regular expression solution is the only one.
use strict;
use warnings;
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';
my %config;
for my $field ($command =~ /(?:"[^"]*"|[^:])+/g) {
my ($key, $val) = split /,/, $field, 2;
($config{$key} = $val // '') =~ s/"([^"]*)"/$1/;
}
use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);
output
%config = (
'TIME' => '08:01:59',
'MOB' => '444001122',
'Type' => 'Whatever',
'CREATE' => '',
'TEL' => '12345678',
'ATTRIBUTES' => 'ID,0,KEY,VALUE',
'USER' => '',
'FIN' => '0'
);
If you have Perl v5.10 or later then you have the convenient (?| ... ) regular expression group, which allows you to write this
use 5.010;
use warnings;
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';
my %config = $command =~ /(\w+) (?| , " ([^"]*) " | , ([^:"]*) | () )/gx;
use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);
which produces identical results to the code above.
This looks like something Text::ParseWords could handle. The quotewords subroutine will split the input on the delimiter :, ignoring delimiters inside quotes. This will give us the basic list of items, seen first in the output as $VAR1. After that, it is a simple matter of parsing the comma separated items with a regex which will handle optional second capture to accommodate empty tags such as those for CREATE and USER.
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
while (<DATA>) {
chomp;
my #list = quotewords(':', 0, $_);
my %hash = map { my ($k, $v) = /([^,]+),?(.*)/; $k => $v; } #list;
print Dumper \#list, \%hash;
}
__DATA__
CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0
Output:
$VAR1 = [
'CREATE',
'USER',
'TEL,12345678',
'MOB,444001122',
'Type,Whatever',
'ATTRIBUTES,ID,0,KEY,VALUE',
'TIME,08:01:59',
'FIN,0'
];
$VAR2 = {
'TIME' => '08:01:59',
'MOB' => '444001122',
'Type' => 'Whatever',
'CREATE' => '',
'TEL' => '12345678',
'ATTRIBUTES' => 'ID,0,KEY,VALUE',
'USER' => '',
'FIN' => '0'
};
my %hash = $command =~ /([^:,]+)(?:,((?:[^:"]|"[^"]*")*))?/g;
s/"([^"]*)"/$1/g
for grep defined, values %hash;