Perl, global match and append something after the matched string - regex

I have a problem in the case of performing a global match. How can I substitute the matched string for a new string, which is made up from the original string plus a new string. The string is like:
$string = "t123:apple;t456:pear;t789:banana";
Then I have a hash like this:
my %hash = (
t123 => 'fruit1',
t456 => 'fruit2',
t789 => 'fruit3',
);
How can I then obtain a new string such as:
$newstring = "t123 fruit1:apple;t456 fruit2:pear;t789 fruit3:banana";
Now, my perl code is:
while($string =~ /t\d{3}/g){
if (exists $hash{"$&"}) {
my $match = $&;
$string =~ s/$&/$match.$hash{"$&"}/;
}
}
It doesn't work though, because the match always starts from the first character. I think I should use pos(string) or something to make it have an offset, but I don't know how to do this.

The easy way is rather easy:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $string = "t123:apple;t456:pear;t789:banana";
my %hash = (
t123 => 'fruit1',
t456 => 'fruit2',
t789 => 'fruit3',
);
$string =~ s/(t\d+)/$1 $hash{$1}/g;
say $string;
But this doesn't ensure that everything that matches t\d{3} is a valid key in your hash. So let's explicitly search for those keys.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $string = "t123:apple;t456:pear;t789:banana";
my %hash = (
t123 => 'fruit1',
t456 => 'fruit2',
t789 => 'fruit3',
);
my $match = join '|', map quotemeta, keys %hash;
$string =~ s/($match)/$1 $hash{$1}/g;
say $string;

Related

how can I take a dynamic variable in value of JSON?

I need use the find and replace through regular expression like following
use strict;
no strict 'refs';
use warnings;
use JSON;
use Encode qw( encode decode encode_utf8 decode_utf8);
my $data =
{
"find_replace" => [
{ "find" => "(.+?)&",
"replace"=> "$1"
}
]
};
my $find_replace_arr = $data->{'find_replace'};
my $string = "http://www.website.com/test.html&code=236523";
my $find = $find_replace_arr->[0]->{find};
my $replace = $find_replace_arr->[0]->{replace};
$string =~ s/$find/$replace/isge;
print $string;
exit();
in this code, I only want to "http://www.website.com/test.html" from the string.
I am not able to get replace (key)'s value dynamically, which is $1.
You can Run the above code.
This code throw the Error Use of uninitialized value $1 in string
Some things to consider. First, the regex ([^&]+) may not give the desired result, as it is really going to capture and replace with the same capture.. resulting in the same output string (confusing I bet).
Next, the replace string "$1"has to be quoted again and e modifier has to be doubled.
So try this:
my $data =
{
"find_replace" => [
{ "find" => "^(.+?)&.*",
"replace"=> '"$1"'
}
]
};
my $find_replace_arr = $data->{'find_replace'};
my $string = "http://www.website.com/test.html&code=236523";
my $find = $find_replace_arr->[0]->{find};
my $replace = $find_replace_arr->[0]->{replace};
$string =~ s/$find/$replace/isgee;
print $string;
exit();
Notice the new regex, ^(.+?)&.* will match the entire string, but the capture (...) will be the result to replace.

Dynamically capture regular expression match in Perl

I'm trying to dynamically catch regex matching in Perl. I've known that eval will help me do this but I may be doing something wrong.
Code:
use strict;
use warnings;
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
my $str = '1/12/2016';
foreach my $pattern (keys (%testHash)) {
my $value = $testHash{$pattern};
my $result;
eval {
local $_ = $str;
/$pattern/;
print "\$1 - $1\n";
print "\$2 - $2\n";
print "\$3 - $3\n";
eval { print "$value\n"; }
}
}
Is it also possible to store captured regex patterns in an array?
I believe what you really want is a dynamic version of the following:
say $str =~ s/(\d+)\/(\d+)\/(\d+)/$1$2$3/gr;
String::Substitution provides what we need to achieve that.
use String::Substitution qw( gsub_copy );
for my $pattern (keys(%testHash)) {
my $replacement = $testHash{$pattern};
say gsub_copy($str, $pattern, $replacement);
}
Note that $replacement can also be a callback. This permits far more complicated substitutions. For example, if you wanted to convert 1/12/2016 into 2016-01-12, you could use the following:
'(\d+)/(\d+)/(\d+)' => sub { sprintf "%d-%02d-%02d", #_[3,1,2] },
To answer your actual question:
use String::Substitution qw( interpolate_match_vars last_match_vars );
for my $pattern (keys(%testHash)) {
my $template = $testHash{$pattern};
$str =~ $pattern # Or /$pattern/ if you prefer
or die("No match!\n");
say interpolate_match_vars($template, last_match_vars());
}
I am not completely sure what you want to do here, but I don't think your program does what you think it does.
You are useing eval with a BLOCK of code. That's like a try block. If it dies inside of that eval block, it will catch that error. It will not run your string like it was code. You need a string eval for that.
Instead of explaining that, here's an alternative.
This program uses sprintf and numbers the parameters. The %1$s syntax in the pattern says _take the first argument (1$) and format it as a string (%s). You don't need to localize or assign to $_ to do a match. The =~ operator does that on other variables for you. I also use qr{} to create a quoted regular expression (essentially a variable containing a precompiled pattern) that I can use directly. Because of the {} as delimiter, I don't need to escape the slashes.
use strict;
use warnings;
use feature 'say'; # like print ..., "\n"
my %testHash = (
qr{(\d+)/(\d+)/(\d+)} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d+) nomatch} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d\d\d\d)} => '%3$4d-%2$02d-%1$02d',
qr{\d} => '%s', # no capture group
);
my $str = '1/12/2016';
foreach my $pattern ( keys %testHash ) {
my #captures = ( $str =~ $pattern );
say "pattern: $pattern";
if ($#+ == 0) {
say " no capture groups";
next;
}
unless (#captures) {
say " no match";
next;
}
# debug-output
for my $i ( 1 .. $#- ) {
say sprintf " \$%d - %s", $i, $captures[ $i - 1 ];
}
say sprintf $testHash{$pattern}, #captures;
}
I included four examples:
The first pattern is the one you had. It uses %1$s and so on as explained above.
The second one does not match. We check the number of elements in #captured by looking at it in scalar context.
The third one shows that you can also reorder the result, or even use the sprintf formatting.
The last one has no capture group. We check by looking at the index of the last element ($# as the sigil for arrays that usually have an # sigil) in #+, which holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. The first element is the end of the overall match, so if this only has one element, we don't have capture groups.
The output for me is this:
pattern: (?^:(\d+)/(\d+)/(\d\d\d\d))
$1 - 1
$2 - 12
$3 - 2016
2016-12-01
pattern: (?^:(\d+)/(\d+)/(\d+) nomatch)
no match
pattern: (?^:\d)
no capture groups
pattern: (?^:(\d+)/(\d+)/(\d+))
$1 - 1
$2 - 12
$3 - 2016
1.12.2016
Note that the order in the output is mixed up. That's because hashes are not ordered in Perl, and if you iterate over the keys in a hash without sort the order is random.
Apologies! I realized both my question and sample code were both vague. But after reading your suggestions I came of with the following code.
I haven't optimized this code yet and there is a limit to the replacement.
foreach my $key (keys %testHash) {
if ( $str =~ $key ) {
my #matchArr = ($str =~ $key); # Capture all matches
# Search and replace (limited from $1 to $9)
for ( my $i = 0; $i < #matchArr; $i++ ) {
my $num = $i+1;
$testHash{$key} =~ s/\$$num/$matchArr[$i]/;
}
$result = $testHash{$key};
last;
}
}
print "$result\n";
Evaluing the regexp in list context returns the matches. so in your example:
use Data::Dumper; # so we can see the result
foreach my $pattern (keys (%testHash)) {
my #a = ($str =~/$pattern/);
print Dumper(\#a);
}
would do the job.
HTH
Georg
Is it also possible to store captured regex patterns in an array?
Of course it is possible to store captured substrings in an array:
#!/usr/bin/env perl
use strict;
use warnings;
my #patterns = map qr{$_}, qw{
(\d+)/(\d+)/(\d+)
};
my $str = '1/12/2016';
foreach my $pattern ( #patterns ) {
my #captured = ($str =~ $pattern)
or next;
print "'$_'\n" for #captured;
}
Output:
'1'
'12'
'2016'
I do not quite understand what you are trying to do with combinations of local, eval EXPR and eval BLOCK in your code and the purpose of the following hash:
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
If you are trying to codify that this pattern should result in three captures, you can do that like this:
my #tests = (
{
pattern => qr{(\d+)/(\d+)/(\d+)},
ncaptures => 3,
}
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern})
or next;
unless (#captured == $test->{ncaptures}) {
# handle failure
}
}
See this answer to find out how you can automate counting the number of capture groups in a pattern. Using the technique in that answer:
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
my #tests = map +{ pattern => qr{$_}, ncaptures => number_of_capturing_groups($_) }, qw(
(\d+)/(\d+)/(\d+)
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern});
ok #captured == $test->{ncaptures};
}
done_testing;
sub number_of_capturing_groups {
"" =~ /|$_[0]/;
return $#+;
}
Output:
ok 1
1..1

Parsing a string to a hash

I have a string:
<https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5>;
rel="next",
<https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5>;
rel="first",
<https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5>;
rel="last"
So the format is
(<val>; rel="key")*
And I want to parse that to a hash with the following format:
next => https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5
first => https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5
last => https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5
In Java I would use a regex pattern to extract each key => value pair and put them into a map. The pattern would be something like:
<([^>]++)>;\s*rel="([^"]++)"
Which would give me the key in the second match group and the value in the first. Would the same approach be the best way to achieve this is Perl, or is there something snazzier I could do?
P.S. the reason I'm using Perl rather than Java is that the server doesn't have Java.
My first inclination was to split the string on commas and work with the three substrings, but it is probably better to use a global match ina while loop.
This should do what you want. (Perl is by far the better tool for text processing like this!)
Update I've just realised that your choice of markdown discarded the angle brackets and newlines. Is this more appropriate? I assume it's a multi-line string?
use strict;
use warnings;
my $str = <<'END';
<https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5>;
rel="next",
<https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5>;
rel="first",
<https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5>;
rel="last"
END
my %data;
while ($str =~ / < ([^<>]+) >; \s* rel="([^"]+)" (?:,\s*)? /xg) {
$data{$2} = $1;
}
use Data::Dump;
dd \%data;
output
{
first => "https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5",
last => "https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5",
next => "https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5",
}
You can split the string on a "," and then use a map to create the hash:
#!/usr/bin/env perl
use strict;
use warnings;
my $str = 'https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5; rel="next", https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5; rel="first", https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5; rel="last"';
my %hash = map {
my ($v, $k) = $_ =~ /\s*([^;]+);\s*rel="([^"]+)".*/;
$k => $v;
} split ',', $str;
foreach my $key (keys %hash) {
print "$key => $hash{$key}\n"
}
output:
first => https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5
next => https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5
last => https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5
update
With the new string you could do:
$str = q(<https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5>; rel="next", <https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5>; rel="first", <https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5>; rel="last");
my %hash = map {
my ($v, $k) = $_ =~ /<([^>]+)>;\s*rel="([^"]+)".*/;
$k => $v;
} split ',', $str;
to get the same result.
use strict;
use warnings;
my $string='https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5; rel="next", https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5; rel="first", https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5; rel="last"';
my #array=split /,/, $string;
my %hash;
foreach(#array)
{
if($_=~/(.*?);\s*rel\=\s*"([^"]+)"/)
{
$hash{$2}=$1;
}
}
print "$_ => $hash{$_}\n" foreach(keys%hash);

Combination of while, array, and regex

The loop fails. What is wrong with the array?
I would like the regex to return B when it parses the first string, and M when it parses the second string.
How is such an regex constructed?
#!/usr/bin/perl
use warnings;
use strict;
my $a = "0.0 B/s";
my $b = "12.0 MiB/s";
while (defined (my $s = shift ("$a", "$b"))) {
my $unit = $1 if ($a =~ m/.*([KMGT])i?B\/s$/);
print "$unit\n";
}
shift is meant to be used with arrays, not lists. If you want to use a while loop, you need to pre-declare an array containing $a and $b (which, by the way, are a bad choice for variable names).
Having said that, a for loop construct is the more natural choice here:
for my $s ( $var1, $var2 ) { ... }
And given that you're trying to extract the measurement unit, why not do things a slightly different way:
say for map { my ( $s ) = /$regex/; $s } $var1, $var2;
You need another substitution:
for ($a, $b) {
if (m!((?:[KMGT]i)?B)/s$!) {
my $unit = $1;
$unit =~ s/(.).*/$1/;
print "$unit\n" if $unit;
}
}
Your while has issues.
You are using variable $a inside loop, when you want to use $s.
I'd use it this way:
#!/usr/bin/perl
use warnings;
use strict;
my $a = "0.0 B/s";
my $b = "12.0 MiB/s";
foreach my $s($a, $b) {
print $1 if ($s =~ m/.*([KMGT])i?B\/s$/);
}

Perl: alter a string by regex match

I'm using this code to alter a string by a regex match.
$a->{'someone'} = "a _{person}";
$a->{'person'} = "gremlin";
$string = "_{someone} and a thing"
while($string =~ /(_\{(.*?)\}/g){
$search = metaquote($1);
$replace = $a->{$2};
$string =~ s/$search/$replace/;
}
The result is a _{person} and a thing but I'm expecting: a gremlin and a thing.
What to do to get this working?
The function is called quotemeta, not metaquote. Also, a right parenthesis is missing in your regex:
#!/usr/bin/perl
use warnings;
use strict;
my $a;
$a->{'someone'} = "a _{person}";
$a->{'person'} = "gremlin";
my $string = "_{someone} and a thing";
while($string =~ /(_\{(.*?)\})/g){
my $search = quotemeta($1);
my $replace = $a->{$2};
$string =~ s/$search/$replace/;
}
print "$string\n";
I also added strict and warnings to help myself avoid common pitfalls.
I think this should be more effecient variant:
use strict;
my $a;
$a->{'someone'} = "a _{person}";
$a->{'person'} = "gremlin";
my $string = "_{someone} and a thing";
while( $string =~ s/(_\{(.*?)\})/ $a->{$2} /ges ) {}
print $string."\n";
This variant repeatedly substitues all of the placeholders in the string for their corresponding hash value until there are none left.
In addition, a is a bad identifier for any variable, so I have named it tokens.
use strict;
use warnings;
my %tokens = (
someone => 'a _{person}',
person => 'gremlin',
);
my $string = '_{someone} and a thing';
1 while $string =~ s/_\{([^}]+)\}/$tokens{$1}/g;
print $string, "\n";
output
a gremlin and a thing