perl function to parse the command output - regex

I have written a perl function that executes a command and parses the command output. The command will provide the output mentioned below. In the function call i pass a number between 1 to 5 as arguments . '1' correspond to lan, 2 corresponds to wan, '3' corresponds to name and so on. ( see the output below) . for example if 1 is passed as an argument in the function call the expected output is '0' ( 1 = lan and value of lan = 0 ) . When i execute the script im not getting the expected output. null is returned.Any suggestions?
command output 1 :
[
{
"lan" : 0, #1
"wan" : 0, #2
"name" : "XYZ", #3
"packets" : 0, #4
"bytes" : 0 #5
}
]
Function call
$self->{'stats_obj'} = Statistics->new( ip => "ip addr")
my $result = $self->{'stats_obj'}->statistics_get('1');
INFO('statistics:' . $result );
Function:
sub statistics_get{
my ($self, $option)= #_;
my $result = $self->_get_hyd_tc();
return $result->{$option};
}
sub _get_hyd_tc {
my ($self) = #_;
my $opt;
my %result;
my $line;
my $cmd = 'cmd goes here';
$self->execute($cmd);
my $count =0;
foreach my $line ( $self->output() ) {
chomp $line;
if ( $line =~ /(Requested table doesn't.*)/i ){
ERROR('table doesnt exist' . $line)
}
if ($line =~ /(.*)/) {
$opt = $1;
$count = 0;
}
elsif ( $line =~ /(.*)/) {
my $key = $1;
my $value = $2;
$result{$opt}{++$count} = $value;
}
}
return \%result;
}

You are approaching this the wrong way.
Your snippet you give is JSON. So really - your best bet by far is to process it as JSON and not try and parse it yourself.
Something like this:
use strict;
use warnings;
use JSON;
use Data::Dumper;
my $command_output = '[
{
"lan" : 0,
"wan" : 0,
"name" : "XYZ",
"packets" : 0,
"bytes" : 0
}
]';
my $json_ob = decode_json( $command_output );
print Dumper \$json_ob;
print $json_ob -> [0] -> {'name'};
Those [] in your text denotes an array. usually that means multiple elements. You could iterate those elements, but as you've just got one, accessing it via [0] does the trick.
Now, you could if you really want map the attributes from 'words' to numeric, but ... there isn't really any need.
But to answer your question - why is null returned - it's because:
if ($line =~ /(.*)/) {
Will always evaluate as true - zero or more of anything.
And therefore you will never run the second elsif loop and so this never happens:
$result{$opt}{++$count} = $value;
And so you never have anything other than an empty array to return.

Related

Searching Perl array with regex and return single capturing group only

I have a Perl script in which I perform web service calls in a loop. The server returns a multivalued HTTP header that I need to parse after each call with information that I will need to make the next call (if it doesn't return the header, I want to exit the loop).
I only care about one of the values in the header, and I need to get the information out of it with a regular expression. Let's say the header is like this, and I only care about the "foo" value:
X-Header: test-abc12345; blah=foo
X-Header: test-fgasjhgakg; blah=bar
I can get the header values like this: #values = $response->header( 'X-Header' );. But how do I quickly check if
There is a foo value, and
Parse and save the foo value for the next iteration?
Ideally, I'd like to do something like this:
my $value = 'default';
do {
# (do HTTP request; use $value)
#values = $response->header( 'X-Header' );
} while( $value = first { /(?:test-)([^;]+)(?:; blah=foo)/ } #values );
But grep, first (from List::Util), etc. all return the entire match and not just the single capturing group I want. I want to avoid cluttering up my code by looping over the array and matching/parsing inside the loop body.
Is what I want possible? What would be the most compact way to write it? So far, all I can come up with is using lookarounds and \K to discard the stuff I don't care about, but this isn't super readable and makes the regex engine perform a lot of unnecessary steps.
So it seems that you want to catch the first element with a certain pattern, but acquire only the pattern. And you want it done nicely. Indeed, first and grep only pass the element itself.
However, List::MoreUtils::first_result does support processing of its match
use List::MoreUtils 0.406 qw(first_result);
my #w = qw(a bit c dIT); # get first "it" case-insensitive
my $res = first_result { ( /(it)/i )[0] } #w;
say $res // 'undef'; #--> it
That ( ... )[0] is needed to put the regex in the list context so that it returns the actual capture. Another way would be firstres { my ($r) = /(it)/i; $r }. Pick your choice
For the data in the question
use warnings;
use strict;
use feature 'say';
use List::MoreUtils 0.406 qw(firstres);
my #data = (
'X-Header: test-abc12345; blah=foo',
'X-Header: test-fgasjhgakg; blah=bar'
);
if (my $r = firstres { ( /test-([^;]+);\s+blah=foo/ )[0] } #data) {
say $r
}
Prints abc12345, clarified in a comment to be the sought result.
Module versions prior to 0.406 (of 2015-03-03) didn't have firstres (alias first_result)
first { ... } #values returns one the values (or undef).
You could use either of these:
my ($value) = map { /...(...).../ } #values;
my $value = ( map { /...(...).../ } #values ) ? $1 : undef;
my $value = ( map { /...(...).../ } #values )[0];
Using first, it would look like the following, which is rather silly:
my $value = first { 1 } map { /...(...).../ } #values;
However, assuming the capture can't be an empty string or the string 0, List::MoreUtils's first_result could be used to avoid the unnecessary matches:
my $value = first_result { /...(...).../ ? $1 : undef } #values;
my $value = first_result { ( /...(...).../ )[0] } #values;
If the returned value can be false (e.g. an empty string or a 0) you can use something like
my $value = first_result { /...(...).../ ? \$1 : undef } #values;
$value = $$value if $value;
The first_result approach isn't necessarily faster in practice.
Following code snippet is looking for foo stored in a variable $find, the found values is stored in variable $found.
my $find = 'foo';
my $found;
while( $response->header( 'X-Header' ) ) {
if( /X-Header: .*?blah=($find)/ ) {
$found = $1;
last;
}
}
say $found if $found;
Sample demo code
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $find = 'foo';
my $found;
my #header = <DATA>;
chomp(#header);
for ( #header ) {
$found = $1 if /X-Header: .*?blah=($find)/;
last if $found;
}
say Dumper(\#header);
say "Found: $found" if $found;
__DATA__
X-Header: test-abc12345; blah=foo
X-Header: test-fgasjhgakg; blah=bar
Output
$VAR1 = [
'X-Header: test-abc12345; blah=foo',
'X-Header: test-fgasjhgakg; blah=bar'
];
Found: foo

In Perl, how can I store arrays as values in a hash?

I don't know if you're allowed to ask questions about a school assignment. I just want to understand what I am supposed to do, not do this for me. Maybe I am missing something so simple it is right in front of me but anyways it was based off of an older assignment, but I had missed this class and I am running into a wall now, and well the problem is that I'm trying to push an array into a hash table. using something like this;
push #{$hash_table{$hash_key}}, $port
and to count the ports while storing them and then to print the contents of the hash
while ( ($key , $value) = each ( %hash ) ) {
print “$key scanned #{$value}”
}
and if I wanted to sort the results I would use
< foreach $key ( keys ( %hash ) ) {
}
The current code I have is this, searching for the string of /iNext-DROP/
using a log file provided. I cant for the life of me find the correct place to add the code above
use warnings;
my $LogRecord;
my $LogRecordCount;
open LOGFILE, "sample.log.txt" or die "couldn't open sample.log.txt";
while ($LogRecord = <LOGFILE>) {
if ($LogRecord =~ /INext-DROP/) {
$LogRecordCount ++;
$LogRecord =~ /(SRC=[0-9\.]* ).*(SPT=[0-9\.]* )/;
$source=$1;
$sport=$2;
print "$source$sport";
print substr( $LogRecord , 0 , $ARGV[1] ) , "\n" if $ARGV[1];
}
}
print "The file contained $LogRecordCount records" if $ARGV[1];
close LOGFILE;
Here's a picture of the old code with comments ;
Old Code -- not much has changed since I keep going back after it doesn't work
The thing you're seemingly having problems with, is that you're not sure where the port is being captured and the hash update.
What's happening is your while loop is iterating the file one line at a time, and capturing values - that $LogRecord =~ line is capturing a pattern - into $1 and $2.
And then that $2 is the thing you can add to your hash, with push.
However, there's a few things I have changed style wise like using lexical file handles because it's better style.
#!/usr/bin/env perl
use warnings;
use strict;
#because it makes debugging easier.
use Data::Dumper;
my $LogRecordCount;
#declare some hashes;
my %ports_from;
my %ips_that_used;
open my $logfile, "sample.log.txt" or die "couldn't open sample.log.txt";
while (my $line = <$logfile>) {
#matches 'current line' - skips stuff that doesn't match.
next unless $line =~ /INext-DROP/;
#increment count.
$LogRecordCount++;
my ( $source, $src_port ) = $line =~ m/SRC=([0-9\.]+).*SPT=([0-9]+)/;
print "$source$sport";
#not sure what this is doing, so I have left it in.
print substr( $line , 0 , $ARGV[1] ) , "\n" if $ARGV[1];
push #{$ports_from{$source}}, $src_port;
push #{$ips_that_used{$src_port}}, $source;
}
print "The file contained $LogRecordCount records" if $ARGV[1];
close $logfile;
print Dumper \%ports_from;
print Dumper \%ips_that_used;
That's built up your hashes.
But when it comes to outputing:
foreach my $ip ( keys %ports_from ) {
print "$ip: ", join ( " ", #{$ports_from{$ip}}) ,"\n"
}
If you wanted to sort them, you would have to do this using sort.
Now sort is a pretty clever function, but by default is sorts alphanumerically. That's ... actually not all that useful when it comes to IP addresses or port numbers, because you probably want to sort those more numerically. The easy answer is Sort::Naturally and use nsort.
However - sort takes a function (it defaults to cmp) that returns -1, 0, 1 depending on relative position.
So sorting 'by IP' might look like this:
sub by_ip {
my #a = split /\./, $a;
my #b = split /\./, $b;
foreach my $octet ( #a ) {
my $comparison = $octet <=> shift ( #b );
return $comparison if $comparison;
}
return 0;
}
And then you could:
foreach my $ip ( sort by_ip keys %ports_from ) {
print "$ip: ", join ( " ", sort { $a <=> $b } #{$ports_from{$ip}}),"\n";
}
Giving you:
24.64.208.134 : 24128 24128 24128
71.228.199.109 : 37091
72.197.8.56 : 9258
75.117.31.43 : 3122
99.248.20.48 : 48725
207.68.178.56 : 80
Given an IP-to-port mapping where duplicates exist though, it might be better still to merely count port-frequency using a hash-of-hashes instead of a hash-of-arrays.
$count_ports_from{$source}{$src_port}++;
And then:
foreach my $ip ( sort by_ip keys %count_ports_from ) {
print "$ip: ";
foreach my $port_num ( sort { $count_ports_from{$a} <=> $count_ports_from{$b} }
keys %{ $count_ports_from{$ip} } )
{
print "\t $port_num : $count_ports_from{$ip}{$port_num}\n";
}
}
Giving you something like:
24.64.208.134 : 24128 : 3
71.228.199.109 : 37091 : 1
72.197.8.56 : 9258 : 1
75.117.31.43 : 3122 : 1
99.248.20.48 : 48725 : 1
207.68.178.56 : 80 : 1

How to print and get index of two regex match in perl?

How can I print and get index from regex here:
my $search1 = "aaaNAMEaaaa";
my $search2 = "bbbbCHECKbbb";
if ( $search1 =~ /na\we/i and $search2 =~ /che\wk/i ) {
print "String found\n";
# This works with one search
# my $matched = $&;
# my $pos = index( $search1, $matched );
}
If both expressions match, $& will only have the last match. i.e. for the example above $& will always have the value CHECK and never have NAME because it was overwritten by the second pattern match.
You can wrap this logic in a function, then call that function as many times as you'd like with different string, pattern combinations:
use strict;
use warnings;
my $search1 = "aaaNAMEaaaa";
my $search2 = "bbbbCHECKbbb";
print index_from_match($search1, qr/na\we/i), "\n"; # 3
print index_from_match($search2, qr/che\wk/i), "\n"; # 4
print index_from_match($search1, qr/che\wk/i), "\n"; # -1
sub index_from_match {
my ($s, $pattern) = #_;
# uses a capture group instead of $&
if ( my ($match) = $s =~ m/($pattern)/ ) {
return index($s, $match);
}
return -1;
}
The core problem is that you're doing two regex comparisons in a single expression, so the values for the first one are lost before they can be processed
It's really hard to see how to help you without understanding the program flow within the conditional statement and how you actually use those values
Other languages use the idea of a match object, and it's easy to simulate that here by writing a subroutine that returns either a [ string, offset ] pair if the pattern matched, or undef if not. It's also less wasteful to use the built-in #- and #+ arrays to provide the values needed instead of repeating the search with index
It would look like this
use strict;
use warnings 'all';
use Carp 'croak';
my $search1 = 'aaaNAMEaaaa';
my $search2 = 'bbbbCHECKbbb';
my $match1 = match($search1, /na\we/i);
my $match2 = match($search2, qr/che\wk/i);
if ( $match1 and $match2 ) {
print "String found\n";
printf qq{"%s" found at offset %d\n}, #$match1;
printf qq{"%s" found at offset %d\n}, #$match2;
}
sub match {
my ($s, $re) = #_;
croak "Compiled regex required" unless ref $re eq 'Regexp';
return unless $s =~ $re;
[ substr($s, $-[0], $+[0]-$-[0]), $-[0] ];
}
output
String found
"NAME" found at offset 3
"CHECK" found at offset 4
I think it would also be neater to write this as
my $match1 = match($search1, qr/na\we/i);
my $match2 = match($search2, qr/che\wk/i);
if ( $match1 and $match2 ) {
print "String found\n";
printf qq{"%s" found at offset %d\n}, #$match1;
printf qq{"%s" found at offset %d\n}, #$match2;
}
Break your compound if into multiple if statements
my $search1 = "aaaNAMEaaaa";
my $search2 = "bbbbCHECKbbb";
my ($matched1, $matched2, $pos1, $pos2);
if ( $search1 =~ /na\we/i) {
$matched1 = $&;
$pos1 = index( $search1, $matched1);
if ($search2 =~ /che\wk/i ) {
print "String found\n";
$matched2 = $&;
$pos2 = index( $search2, $matched2);
#do whatever you need with $pos1 & $pos2
} else {
#reset previously set vars
undef $matched1;
undef $pos1;
}
}

How to automagically create pattern based on real data?

I have many vendors in database, they all differ in some aspect of their data. I'd like to make data validation rule which is based on previous data.
Example:
A: XZ-4, XZ-23, XZ-217
B: 1276, 1899, 22711
C: 12-4, 12-75, 12
Goal: if user inputs string 'XZ-217' for vendor B, algorithm should compare previous data and say: this string is not similar to vendor B previous data.
Is there some good way/tools to achieve such comparison? Answer could be some generic algoritm or Perl module.
Edit:
The "similarity" is hard to define, i agree. But i'd like to catch to algorithm, which could analyze previous ca 100 samples and then compare the outcome of analyze with new data. Similarity may based on length, on use of characters/numbers, string creation patterns, similar beginning/end/middle, having some separators in.
I feel it is not easy task, but on other hand, i think it has very wide use. So i hoped, there is already some hints.
You may want to peruse:
http://en.wikipedia.org/wiki/String_metric and http://search.cpan.org/dist/Text-Levenshtein/Levenshtein.pm (for instance)
Joel and I came up with similar ideas. The code below differentiates 3 types of zones.
one or more non-word characters
alphanumeric cluster
a cluster of digits
It creates a profile of the string and a regex to match input. In addition, it also contains logic to expand existing profiles. At the end, in the task sub, it contains some pseudo logic which indicates how this might be integrated into a larger application.
use strict;
use warnings;
use List::Util qw<max min>;
sub compile_search_expr {
shift;
#_ = #{ shift() } if #_ == 1;
my $str
= join( '|'
, map { join( ''
, grep { defined; }
map {
$_ eq 'P' ? quotemeta;
: $_ eq 'W' ? "\\w{$_->[1],$_->[2]}"
: $_ eq 'D' ? "\\d{$_->[1],$_->[2]}"
: undef
;
} #$_
)
} #_ == 1 ? #{ shift } : #_
);
return qr/^(?:$str)$/;
}
sub merge_profiles {
shift;
my ( $profile_list, $new_profile ) = #_;
my $found = 0;
PROFILE:
for my $profile ( #$profile_list ) {
my $profile_length = #$profile;
# it's not the same profile.
next PROFILE unless $profile_length == #$new_profile;
my #merged;
for ( my $i = 0; $i < $profile_length; $i++ ) {
my $old = $profile->[$i];
my $new = $new_profile->[$i];
next PROFILE unless $old->[0] eq $new->[0];
push( #merged
, [ $old->[0]
, min( $old->[1], $new->[1] )
, max( $old->[2], $new->[2] )
]);
}
#$profile = #merged;
$found = 1;
last PROFILE;
}
push #$profile_list, $new_profile unless $found;
return;
}
sub compute_info_profile {
shift;
my #profile_chunks
= map {
/\W/ ? [ P => $_ ]
: /\D/ ? [ W => length, length ]
: [ D => length, length ]
}
grep { length; } split /(\W+)/, shift
;
}
# Psuedo-Perl
sub process_input_task {
my ( $application, $input ) = #_;
my $patterns = $application->get_patterns_for_current_customer;
my $regex = $application->compile_search_expr( $patterns );
if ( $input =~ /$regex/ ) {}
elsif ( $application->approve_divergeance( $input )) {
$application->merge_profiles( $patterns, compute_info_profile( $input ));
}
else {
$application->escalate(
Incident->new( issue => INVALID_FORMAT
, input => $input
, customer => $customer
));
}
return $application->process_approved_input( $input );
}
Here is my implementation and a loop over your test cases. Basically you give a list of good values to the function and it tries to build a regex for it.
output:
A: (?^:\w{2,2}(?:\-){1}\d{1,3})
B: (?^:\d{4,5})
C: (?^:\d{2,2}(?:\-)?\d{0,2})
code:
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw'uniq each_arrayref';
my %examples = (
A => [qw/ XZ-4 XZ-23 XZ-217 /],
B => [qw/ 1276 1899 22711 /],
C => [qw/ 12-4 12-75 12 /],
);
foreach my $example (sort keys %examples) {
print "$example: ", gen_regex(#{ $examples{$example} }) || "Generate failed!", "\n";
}
sub gen_regex {
my #cases = #_;
my %exploded;
# ex. $case may be XZ-217
foreach my $case (#cases) {
my #parts =
grep { defined and length }
split( /(\d+|\w+)/, $case );
# #parts are ( XZ, -, 217 )
foreach (#parts) {
if (/\d/) {
# 217 becomes ['\d' => 3]
push #{ $exploded{$case} }, ['\d' => length];
} elsif (/\w/) {
#XZ becomes ['\w' => 2]
push #{ $exploded{$case} }, ['\w' => length];
} else {
# - becomes ['lit' => '-']
push #{ $exploded{$case} }, ['lit' => $_ ];
}
}
}
my $pattern = '';
# iterate over nth element (part) of each case
my $ea = each_arrayref(values %exploded);
while (my #parts = $ea->()) {
# remove undefined (i.e. optional) parts
my #def_parts = grep { defined } #parts;
# check that all (defined) parts are the same type
my #part_types = uniq map {$_->[0]} #def_parts;
if (#part_types > 1) {
warn "Parts not aligned\n";
return;
}
my $type = $part_types[0]; #same so make scalar
# were there optional parts?
my $required = (#parts == #def_parts);
# keep the values of each part
# these are either a repitition or lit strings
my #values = sort uniq map { $_->[1] } #def_parts;
# these are for non-literal quantifiers
my $min = $required ? $values[0] : 0;
my $max = $values[-1];
# write the specific pattern for each type
if ($type eq '\d') {
$pattern .= '\d' . "{$min,$max}";
} elsif ($type eq '\w') {
$pattern .= '\w' . "{$min,$max}";
} elsif ($type eq 'lit') {
# quote special characters, - becomes \-
my #uniq = map { quotemeta } uniq #values;
# join with alternations, surround by non-capture grouup, add quantifier
$pattern .= '(?:' . join('|', #uniq) . ')' . ($required ? '{1}' : '?');
}
}
# build the qr regex from pattern
my $regex = qr/$pattern/;
# test that all original patterns match (#fail should be empty)
my #fail = grep { $_ !~ $regex } #cases;
if (#fail) {
warn "Some cases fail for generated pattern $regex: (#fail)\n";
return '';
} else {
return $regex;
}
}
To simplify the work of finding the pattern, optional parts may come at the end, but no required parts may come after optional ones. This could probably be overcome but it might be hard.
If there was a Tie::StringApproxHash module, it would fit the bill here.
I think you're looking for something that combines the fuzzy-logic functionality of String::Approx and the hash interface of Tie::RegexpHash.
The former is more important; the latter would make light work of coding.

Perl regex syntax generation

This is a follow up to the question posted here: Perl Regex syntax
The results from that discussion yielded this script:
#!/usr/bin/env perl
use strict;
use warnings;
my #lines = <DATA>;
my $current_label = '';
my #ordered_labels;
my %data;
for my $line (#lines) {
if ( $line =~ /^\/(.*)$/ ) { # starts with slash
$current_label = $1;
push #ordered_labels, $current_label;
next;
}
if ( length $current_label ) {
if ( $line =~ /^(\d) "(.*)"$/ ) {
$data{$current_label}{$1} = $2;
next;
}
}
}
for my $label ( #ordered_labels ) {
print "$label <- as.factor($label\n";
print " , levels= c(";
print join(',',map { $_ } sort keys %{$data{$label}} );
print ")\n";
print " , labels= c(";
print join(',',
map { '"' . $data{$label}{$_} . '"' }
sort keys %{$data{$label}} );
print ")\n";
print " )\n";
}
__DATA__
...A bunch of nonsense I do not care about...
...
Value Labels
/gender
1 "M"
2 "F"
/purpose
1 "business"
2 "vacation"
3 "tiddlywinks"
execute .
Essentially, I need to build the Perl to accommodate a syntax shorthand found in the SPSS file. For adjacent columns, SPSS allows one to type something like:
VALUE LABELS
/agree1 to agree5
1 "Strongly disagree"
2 "Disagree"
3 "Neutral"
4 "Agree"
5 "Strongly agree"
As the script currently exists, it will generate this:
agree1 to agree5 <- factor(agree1 to agree5
, levels= c(1,2,3,4,5,6)
, labels= c("Strongly disagree","Disagree","Neutral","Agree","Strongly agree","N/A")
)
and I need it to produce something like this:
agree1 <- factor(agree1
, levels= c(1,2,3,4,5,6)
, labels= c("Strongly disagree","Disagree","Neutral","Agree","Strongly agree","N/A")
)
agree2 <- factor(agree2
, levels= c(1,2,3,4,5,6)
, labels= c("Strongly disagree","Disagree","Neutral","Agree","Strongly agree","N/A")
)
…
use strict;
use warnings;
main();
sub main {
my #lines = <DATA>;
my $vlabels = get_value_labels(#lines);
write_output_delim($vlabels);
}
# Extract the value label information from SPSS syntax.
sub get_value_labels {
my (#vlabels, $i, $j);
for my $line (#_){
if ( $line =~ /^\/(.+)/ ){
my #vars = parse_var_range($1);
$i = #vlabels;
$j = $i + #vars - 1;
push #vlabels, { var => $_, codes => [] } for #vars;
}
elsif ( $line =~ /^\s* (\d) \s+ "(.*)"$/x ){
push #{$vlabels[$_]{codes}}, [$1, $2] for $i .. $j;
}
}
return \#vlabels;
}
# A helper function to handle variable ranges: "agree1 to agree3".
sub parse_var_range {
my $vr = shift;
my #vars = split /\s+ to \s+/x, $vr;
return $vr unless #vars > 1;
my ($stem) = $vars[0] =~ /(.+?)\d+$/;
my #n = map { /(\d+)$/ } #vars;
return map { "$stem" . $_ } $n[0] .. $n[1];
}
sub write_output_delim {
my $vlabels = shift;
for my $vlab (#$vlabels){
print $vlab->{var}, "\n";
print join("\t", '', #$_), "\n" for #{$vlab->{codes}}
}
}
sub write_output_factors {
# You get the idea...
}
__DATA__
/gender
1 "M"
2 "F"
/purpose
1 "business"
2 "vacation"
3 "tiddlywinks"
/agree1 to agree3
1 "Disagree"
2 "Neutral"
3 "Agree"