Perl substitute strings in hash - regex

Hi I have an perl hash of arbitrary depth. I want to substitute the a string in entire structure with something else.
What is the right approach to do it?
I did something like this
#convert the hash to string for manipulation
my $data = YAML::Dump(%hash);
# do manipulation
---
---
# time to get back the hash
%hash = YAML::Load($data);

Your idea seems very risky to me, since it can be hard to be sure that the substitution won't destroy something in the output of YAML::Dump that will prevent the result from being read back in again, or worse, something that will alter the structure of the hash as it is represented in the dump string. What if the manipulation you are trying to perform is to replace : with , or ’ with ', or something of that sort?
I would probably do something more like this:
use Scalar::Util 'reftype';
# replace $this with $that in key names and string values of $hash
# recursively apply replacement in hash and all its subhashes
sub hash_replace {
my ($hash, $this, $that) = #_;
for my $k (keys %$hash) {
# substitution in value
my $v = $hash->{$k};
if (ref $v && reftype($v) eq "HASH") {
hash_replace($v, $this, $that);
} elsif (! ref $v) {
$v =~ s/$this/$that/og;
}
my $new_hash = {};
for my $k (keys %$hash) {
# substitution in key
(my $new_key = $k) =~ s/$this/$that/og;
$new_hash->{$new_key} = $hash->{$k};
}
%$hash = %$new_hash; # replace old keys with new keys
}
The s/…/…/ replacement I used here may not be appropriate for your task; you should feel free to use something else. For example, instead of strings $this and $that you might pass two functions, $key_change and $val_change which are applied to keys and to values, respectively, returning the modified versions. See the ###### lines below:
use Scalar::Util 'reftype';
# replace $this with $that in key names and string values of $hash
# recursively apply replacement in hash and all its subhashes
sub hash_replace {
my ($hash, $key_change, $val_change) = #_;
for my $k (keys %$hash) {
# substitution in value
my $v = $hash->{$k};
if (ref $v && reftype($v) eq "HASH") {
hash_replace($v, $key_change, $val_change);
} elsif (! ref $v) {
$v = $val_change->($v); #######
}
}
my $new_hash = {};
for my $k (keys %$hash) {
# substitution in key
my $new_key = $key_change->($k); #######
$new_hash->{$new_key} = $hash->{$k};
}
%$hash = %$new_hash;
}

Here's one way to attack it, by recursing through the hash. In this code, you pass in a sub that does whatever you like to each value in the nested hash. This code only modifies the values, not the keys, and it ignores other reference types (ie. scalar refs, array refs) in the nested structure.
#!/usr/bin/perl -w
use Modern::Perl;
## Visit all nodes in a nested hash. Bare-bones.
sub visit_hash
{
my ($start, $sub) = #_;
my #q = ( $start );
while (#q) {
my $hash = pop #q;
foreach my $key ( keys %{$hash} ) {
my $ref = ref($hash->{$key});
if ( $ref eq "" ) { # not a reference
&$sub( $hash->{$key} );
next;
}
if ( $ref eq "HASH" ) { # reference to a nested hash
push #q, $hash->{$key};
next;
}
# ignore other reference types.
}
}
}
The following gives an example of how to use it, replacing e with E in a nested hash:
# Example of replacing a string in all values:
my %hash =
(
a => "fred",
b => "barney",
c => "wilma",
d => "betty",
nest1 =>
{
1 => "red",
2 => "orange",
3 => "green"
},
nest2 =>
{
x => "alpha",
y => "beta",
z => "gamma"
},
);
use YAML::XS;
print "Before:\n";
print Dump( \%hash );
# now replace 'e' with 'E' in all values.
visit_hash( \%hash, sub { $_[0] =~ s/e/E/g; } );
print "After:\n";
print Dump( \%hash );

Related

Match key elements in same hash with regex and store it in HoA

I have list of elements in a hash (%Hash).
I need to compare key elements each other and if one key matches with another key (with certain condition), then it would become a pair and should be stored it in HOA.
Here is my script:
use strict; use warnings;
use Data::Dumper;
my (%A, %Final);
my %Hash = (
'Network=ROOT,Network=R16,Me=4462,Element=1,Node=1,Sec=1,Car=3' => 1,
'Network=ROOT,Network=R16,Me=4462,Element=1,Equipment=1,Rack=1,Slot=1,Unit=1,DeviceSet=1,Device=1' => 1
);
foreach my $car (keys %Hash){
if($car =~ m/Car=\d+/){
if($car =~ /(.*),Node/){
$A{$1} = $car;
}
}
}
print Dumper(\%A);
foreach (keys %Hash){
if($_ =~ m/$A{$_}(.*)(Device=\d+)$/){
push #{$Final{$_}}, $A{$_};
}
}
print "Final:\n".Dumper(\%Final);
Let's say in example, if any element contains Network=ROOT,Network=R16,Me=4462,Element=1 then if any other element contains above data as well as key ends with Device=\d+ then the both should become a key and array of values.
Current result:
$VAR1 = {
'Network=ROOT,Network=R16,Me=4462,Element=1,Equipment=1,Rack=1,Slot=1,Unit=1,DeviceSet=1,Device=1' => [
undef
]
};
Expected Result:
$VAR1 = {
'Network=ROOT,Network=R16,Me=4462,Element=1,Equipment=1,Rack=1,Slot=1,Unit=1,DeviceSet=1,Device=1' => [
Network=ROOT,Network=R16,Me=4462,Element=1,Node=1,Sec=1,Car=3
]
};
Why I am getting undef value in the HOA.
Update:
I have updated the code, in the second hash iteration the code looks like this:
foreach my $ele (keys %Hash){
foreach my $s_c(keys %A){
if($ele =~ m/$s_c(.*)(Device=\d+)$/){
push #{$Final{$ele}}, $A{$s_c};
}
}
}
I am able to get the desired result now. But is there any compact way to achieve this (since I am iterating hash within a hash).
You can use maps to shorten your code. Here is the solution.
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
my (%A, %Final);
my %Hash = (
'Network=ROOT,Network=R16,Me=4462,Element=1,Node=1,Sec=1,Car=3' => 1,
'Network=ROOT,Network=R16,Me=4462,Element=1,Equipment=1,Rack=1,Slot=1,Unit=1,DeviceSet=1,Device=1' => 1
);
map {
$A{$1} = $_ if $_ =~ m/Car=\d+/ && $_ =~ /(.*),Node/
} keys %Hash;
print Dumper(\%A);
map {
my $ele = $_;
map {
my $s_c = $_;
push #{$Final{$ele}}, $A{$s_c} if $ele =~ m/$s_c.*Device=\d+$/;
} keys %A;
} keys %Hash;
print "Final:\n".Dumper(\%Final);

How to change match modifier within a program in perl? [duplicate]

Is there a way to use a variable as modifier in a substitution?
my $search = 'looking';
my $replace = '"find: $1 ="';
my $modifier = 'ee';
s/$search/$replace/$modifier;
I need to use an array of hashes to make bulk search-replace with different modifiers.
While the method using eval to compile a new substitution is probably the most straightforward, you can create a substitution that is more modular:
use warnings;
use strict;
sub subst {
my ($search, $replace, $mod) = #_;
if (my $eval = $mod =~ s/e//g) {
$replace = qq{'$replace'};
$replace = "eval($replace)" for 1 .. $eval;
} else {
$replace = qq{"$replace"};
}
sub {s/(?$mod)$search/$replace/ee}
}
my $sub = subst '(abc)', 'uc $1', 'ise';
local $_ = "my Abc string";
$sub->();
print "$_\n"; # prints "my ABC string"
This is only lightly tested, and it is left as an exercise for the reader to implement other flags like g
You could use eval, if you put on your safety goggles and your divide-by-zero suit.
E.g.:
use strict;
use warnings;
sub mk_re {
my ($search, $replace, $modifier) = #_;
$modifier ||= '';
die "Bad modifier $modifier" unless $modifier =~ /^[msixge]*$/;
my $sub = eval "sub { s/($search)/$replace/$modifier; }";
die "Error making regex for [$search][$replace][$modifier]: $#" unless $sub;
return $sub;
}
my $search = 'looking';
my $replace = '"find: $1 ="';
my $modifier = 'e';
# Sub can be stored in an array or hash
my $sub = mk_re($search, $replace, $modifier);
$_ = "abc-looking-def";
print "$_\n";
$sub->();
print "$_\n";
Hm, if I had to do it I would do like this:
use warnings;
use strict;
my #stuff = (
{
search => "this",
replace => "that",
modifier => "g",
},
{
search => "ono",
replace => "wendy",
modifier => "i",
}
);
$_ = "this ono boo this\n";
for my $h (#stuff) {
if ($h->{modifier} eq 'g') {
s/$h->{search}/$h->{replace}/g;
} elsif ($h->{modifier} eq 'i') {
s/$h->{search}/$h->{replace}/i;
}
# etc.
}
print;
There are only so many different modifiers you might want to use so I think this is easy enough.
You can use eval for this, but it's awfully messy.
Of course s/$search/$replace/ work as you expect. It is the dynamic modifiers that are not straightforward.
For the regular match modifiers of pimsx you can use Perl's Extended Patterns to modify the modifier flags on the fly as part of your pattern. These are of the form (?pimsx-imsx) to turn on / off those modifiers.
For the s// e and ee forms, you can use (?{ perl code}) documented in the same perlre section. For all of eval e or ee forms, consider the security of the resulting code!
There is no form to modify global to first match that I am aware of, so global vs first match would need to be separate statements.
Here's a combination of Kinopiko's answer and eval.
eval is used here to generate the lookup table in a controlled and maintainable fashion, and a lookup table is used to save all the if.. elsif.. elsif which are not too fun to look at.
(very lightly tested)
my #stuff = (
{
search => "this",
replace => "that",
modifier => "g",
},
{
search => "ono",
replace => "wendy",
modifier => "i",
}
);
$_ = "this ono boo this\n";
my #modifiers = qw{m s i x g e};
my $s_lookup = {};
foreach my $modifier (#modifiers) {
$s_lookup->{$modifier} = eval " sub { s/\$_[0]/\$_[1]/$modifier } ";
}
for my $h (#stuff) {
$s_lookup->{$h->{modifier}}->($h->{search},$h->{replace});
}
print;
To be fully useful this needs:
combinations of possible modifiers
sort function on the lookup table so 'msi' combination and 'mis' combination will go to the same key.

Perl remove characters from string matrix according to pattern

I have multiple strings with the same length stored in a hash data structure. Example:
$VAR1 = {
'first' => 'abcXY',
'second' => 'XYXYa',
'third' => '*abXZ'
};
From this 'matrix' of characters, I would like to remove the 'columns' which exclusively contain the characters X or Y. In the above example, this would be the fourth character of each string (4th 'column').
The desired result would be:
$VAR1 = {
'first' => 'abcY',
'second' => 'XYXa',
'third' => '*abZ'
};
Following code does this by creating a transpose of the values of my hash structure and then determines which indices to keep:
# data structure
my %h = ('first'=>'abcXY', 'second'=>'XYXYa', 'third'=>'*abXZ' );
# get length of all values in hash
my $nchar = length $h{(keys(%h))[0]};
# transpose values of hash
my #transposed = map { my $idx=$_; [map {substr ($_, $idx, 1) } values(%h)] } 0..$nchar-1;
# determine indices which I want to keep
my #indices;
for my $i (0..$#transposed){
my #a = #{$transposed[$i]};
# do not keep index if column consists of X and Y
if ( scalar(grep {/X|Y/} #a) < scalar(#a) ) {
push #indices, $i;
}
}
# only keep letters with indices
for my $k (keys %h){
my $str = $h{$k};
my $reduced = join "", map{ substr ($str, $_, 1) } #indices;
$h{$k} = $reduced;
}
This is a terrible amount of code for such a simple operation. How could I do this more elegantly (preferably not with some matrix library, but with standard perl)?
Edit
Here another example: From the following strings, the first and last characters should be removed, because in both strings, the first and last position is either X or Y:
$VAR1 = {
'1' => 'Xsome_strX',
'2' => 'YsomeXstrY'
};
Desired result:
$VAR1 = {
'1' => 'some_str',
'2' => 'someXstr'
};
my $total = values %hash;
my %ind;
for my $v (values %hash) {
$ind{ pos($v) -1 }++ while $v =~ /[XY]/g;
}
my #to_remove = sort {$b <=> $a} grep { $ind{$_} == $total } keys %ind;
for my $v (values %hash) {
substr($v, $_, 1, "") for #to_remove;
}

Matching hash key in hash with multiple keys

I want get the values from a hash that uses two keys using a regex just on the second key. This is what I have:
use List::Util qw<first>;
$key1 = "key";
my $value = $hash{$key1}{ ( first { m/teen/ } keys %hash ) || '' };
use warnings;
use strict;
my %hash = ( 'key1' => 'result',
'key2' => 'wrong');
foreach my $key (keys %hash) {
print "$key, $hash{$key}\n" if $hash{$key} =~ /result/;
}
prints:
key1, result
Edit - at a second glance, although I'm still baffled by your code and question, it appears that you want to find a value associated with a particular key, in this case 'key1':
print "key1 = $hash{'key1'}\n";
which prints:
key1 = result

How to automagically create pattern based on real data?

I have many vendors in database, they all differ in some aspect of their data. I'd like to make data validation rule which is based on previous data.
Example:
A: XZ-4, XZ-23, XZ-217
B: 1276, 1899, 22711
C: 12-4, 12-75, 12
Goal: if user inputs string 'XZ-217' for vendor B, algorithm should compare previous data and say: this string is not similar to vendor B previous data.
Is there some good way/tools to achieve such comparison? Answer could be some generic algoritm or Perl module.
Edit:
The "similarity" is hard to define, i agree. But i'd like to catch to algorithm, which could analyze previous ca 100 samples and then compare the outcome of analyze with new data. Similarity may based on length, on use of characters/numbers, string creation patterns, similar beginning/end/middle, having some separators in.
I feel it is not easy task, but on other hand, i think it has very wide use. So i hoped, there is already some hints.
You may want to peruse:
http://en.wikipedia.org/wiki/String_metric and http://search.cpan.org/dist/Text-Levenshtein/Levenshtein.pm (for instance)
Joel and I came up with similar ideas. The code below differentiates 3 types of zones.
one or more non-word characters
alphanumeric cluster
a cluster of digits
It creates a profile of the string and a regex to match input. In addition, it also contains logic to expand existing profiles. At the end, in the task sub, it contains some pseudo logic which indicates how this might be integrated into a larger application.
use strict;
use warnings;
use List::Util qw<max min>;
sub compile_search_expr {
shift;
#_ = #{ shift() } if #_ == 1;
my $str
= join( '|'
, map { join( ''
, grep { defined; }
map {
$_ eq 'P' ? quotemeta;
: $_ eq 'W' ? "\\w{$_->[1],$_->[2]}"
: $_ eq 'D' ? "\\d{$_->[1],$_->[2]}"
: undef
;
} #$_
)
} #_ == 1 ? #{ shift } : #_
);
return qr/^(?:$str)$/;
}
sub merge_profiles {
shift;
my ( $profile_list, $new_profile ) = #_;
my $found = 0;
PROFILE:
for my $profile ( #$profile_list ) {
my $profile_length = #$profile;
# it's not the same profile.
next PROFILE unless $profile_length == #$new_profile;
my #merged;
for ( my $i = 0; $i < $profile_length; $i++ ) {
my $old = $profile->[$i];
my $new = $new_profile->[$i];
next PROFILE unless $old->[0] eq $new->[0];
push( #merged
, [ $old->[0]
, min( $old->[1], $new->[1] )
, max( $old->[2], $new->[2] )
]);
}
#$profile = #merged;
$found = 1;
last PROFILE;
}
push #$profile_list, $new_profile unless $found;
return;
}
sub compute_info_profile {
shift;
my #profile_chunks
= map {
/\W/ ? [ P => $_ ]
: /\D/ ? [ W => length, length ]
: [ D => length, length ]
}
grep { length; } split /(\W+)/, shift
;
}
# Psuedo-Perl
sub process_input_task {
my ( $application, $input ) = #_;
my $patterns = $application->get_patterns_for_current_customer;
my $regex = $application->compile_search_expr( $patterns );
if ( $input =~ /$regex/ ) {}
elsif ( $application->approve_divergeance( $input )) {
$application->merge_profiles( $patterns, compute_info_profile( $input ));
}
else {
$application->escalate(
Incident->new( issue => INVALID_FORMAT
, input => $input
, customer => $customer
));
}
return $application->process_approved_input( $input );
}
Here is my implementation and a loop over your test cases. Basically you give a list of good values to the function and it tries to build a regex for it.
output:
A: (?^:\w{2,2}(?:\-){1}\d{1,3})
B: (?^:\d{4,5})
C: (?^:\d{2,2}(?:\-)?\d{0,2})
code:
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw'uniq each_arrayref';
my %examples = (
A => [qw/ XZ-4 XZ-23 XZ-217 /],
B => [qw/ 1276 1899 22711 /],
C => [qw/ 12-4 12-75 12 /],
);
foreach my $example (sort keys %examples) {
print "$example: ", gen_regex(#{ $examples{$example} }) || "Generate failed!", "\n";
}
sub gen_regex {
my #cases = #_;
my %exploded;
# ex. $case may be XZ-217
foreach my $case (#cases) {
my #parts =
grep { defined and length }
split( /(\d+|\w+)/, $case );
# #parts are ( XZ, -, 217 )
foreach (#parts) {
if (/\d/) {
# 217 becomes ['\d' => 3]
push #{ $exploded{$case} }, ['\d' => length];
} elsif (/\w/) {
#XZ becomes ['\w' => 2]
push #{ $exploded{$case} }, ['\w' => length];
} else {
# - becomes ['lit' => '-']
push #{ $exploded{$case} }, ['lit' => $_ ];
}
}
}
my $pattern = '';
# iterate over nth element (part) of each case
my $ea = each_arrayref(values %exploded);
while (my #parts = $ea->()) {
# remove undefined (i.e. optional) parts
my #def_parts = grep { defined } #parts;
# check that all (defined) parts are the same type
my #part_types = uniq map {$_->[0]} #def_parts;
if (#part_types > 1) {
warn "Parts not aligned\n";
return;
}
my $type = $part_types[0]; #same so make scalar
# were there optional parts?
my $required = (#parts == #def_parts);
# keep the values of each part
# these are either a repitition or lit strings
my #values = sort uniq map { $_->[1] } #def_parts;
# these are for non-literal quantifiers
my $min = $required ? $values[0] : 0;
my $max = $values[-1];
# write the specific pattern for each type
if ($type eq '\d') {
$pattern .= '\d' . "{$min,$max}";
} elsif ($type eq '\w') {
$pattern .= '\w' . "{$min,$max}";
} elsif ($type eq 'lit') {
# quote special characters, - becomes \-
my #uniq = map { quotemeta } uniq #values;
# join with alternations, surround by non-capture grouup, add quantifier
$pattern .= '(?:' . join('|', #uniq) . ')' . ($required ? '{1}' : '?');
}
}
# build the qr regex from pattern
my $regex = qr/$pattern/;
# test that all original patterns match (#fail should be empty)
my #fail = grep { $_ !~ $regex } #cases;
if (#fail) {
warn "Some cases fail for generated pattern $regex: (#fail)\n";
return '';
} else {
return $regex;
}
}
To simplify the work of finding the pattern, optional parts may come at the end, but no required parts may come after optional ones. This could probably be overcome but it might be hard.
If there was a Tie::StringApproxHash module, it would fit the bill here.
I think you're looking for something that combines the fuzzy-logic functionality of String::Approx and the hash interface of Tie::RegexpHash.
The former is more important; the latter would make light work of coding.