Perl Regex for zero or one match - regex

Below are the two strings-
12/31/2011 05:34:27;U;11.comp;host=win workgroup=home username=bob cmemory=1325133456 qmemory=1325133456 smemory=1325133456 uptime=1325289867
12/31/2011 01:09:20;D;12.comp;host=win workgroup=home username=sam cmemory=1325151687 qmemory=1325151687 smemory=1325151687 uptime=1325228636 session=4677 downtime=1325270175 Exit_status=0
From above strings I want to pick host, workgroup, username, uptime and downtime values using Regex in Perl.
Below is my Perl script-
foreach $line (<FILE>) {
if($line =~ m<\d{2}/\d{2}/\d{4}\s+\d{2}:\d{2}:\d{2};[U|D].*host=(\w+)\s+workgroup=(\w+)\s+hostname=(\w+)\s+.*uptime=(\d+)\s+.*(downtime=)?(\d*)>){
my $host = $1;
my $workgroup = $2;
my $hostname = $3;
my $uptime = $4;
my $downtime = $5;
print "host=$host workgroup=$workgroup hostname=$hostname uptime=$uptime downtime=$downtime\n";
}
}
The only problem, I am facing here is because of downtime. This attribute may not be present in the line. I am not able to pick this field properly.

Why not use split instead? Then you could add the various categories to a hash, like so:
use strict;
use warnings;
use Data::Dumper;
while (<DATA>) {
my ($date, $foo, $bar, $data) = split /;/, $_, 4;
my %data = map { split /=/ } split ' ', $data;
print Dumper \%data;
}
__DATA__
12/31/2011 05:34:27;U;11.comp;host=win workgroup=home username=bob cmemory=1325133456 qmemory=1325133456 smemory=1325133456 uptime=1325289867
12/31/2011 01:09:20;D;12.comp;host=win workgroup=home username=sam cmemory=1325151687 qmemory=1325151687 smemory=1325151687 uptime=1325228636 session=4677 downtime=1325270175 Exit_status=0
Output:
$VAR1 = {
'workgroup' => 'home',
'cmemory' => '1325133456',
'qmemory' => '1325133456',
'uptime' => '1325289867',
'smemory' => '1325133456',
'username' => 'bob',
'host' => 'win'
};
$VAR1 = {
'qmemory' => '1325151687',
'Exit_status' => '0',
'smemory' => '1325151687',
'username' => 'sam',
'host' => 'win',
'workgroup' => 'home',
'cmemory' => '1325151687',
'session' => '4677',
'downtime' => '1325270175',
'uptime' => '1325228636'
};
If you now want to refer to the "downtime" value, you can do something such as:
my $downtime = $hash{downtime} // "N/A";
Where // is the defined-or operator, somewhat preferred here over logical or ||.

Related

Perl: regex to cut the words enclosed in parenthesis

I have an array #WIALOG_lines, with below entries
(0552) -*--# "<No comment>" 27-Oct-2020 10:40 AM
(0553) M---$ user1 100900 "Random job
(0554) ----# 1119996 "patch content"
(0562) -*--# "<No comment>" 24-Oct-2020 10:40 AM
I need to have 0552,0553,0554 and 0562 in this array. I am trying out the below command and is not working. Can you help with regex to get the values enclosed in parenthesis only.
s/(^[^\(]+")|("[^\)]+)//g for #WIALOG_lines;
We want to match the numbers between the first set of parentheses. That pattern is an open paren, any number of characters that are not a closing paren, and the final paren. And we want to capture the characters between the parenthesis. Parenthesis are special in regex, so they need to be escaped. That regex pattern is: m/\(([^)]*)\)/, where the escaped parentheses match literally, and the other pair of parentheses is a capture group.
Then we want to apply that pattern to each line of the original array, which suggests using a map block. For each element of the original array, do the match and return the matching part. $1 will be the string that matches the first capture group.
my #key = map { m/\(([^)]+)\)/; $1 } #WIALOG_lines;
A solution to posted question, as was indicated earlier, is easy to implement by capturing digits in brackets to store them in an array for further processing.
use strict;
use warnings;
use feature 'say';
my #keys = map { $1 if /^\((\d+)\)/ } <DATA>;
say for #keys;
__DATA__
(0552) -*--# "<No comment>" 27-Oct-2020 10:40 AM
(0553) M---$ user1 100900 "Random job
(0554) ----# 1119996 "patch content"
(0562) -*--# "<No comment>" 24-Oct-2020 10:40 AM
More interesting case would be if input data required some additional parsing. Following example demonstrates this approach.
NOTE: input data for record with label 0553 is missing closing "
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my %log;
while( <DATA> ) {
chomp;
my #col = /\((\d+)\)\s+(\S+)\s+(.*)\z/;
my $data;
if( $col[2] =~ /(.*) (\d{2}-\w{3}-\d{4}) (\d{2}:\d{2}) (\w{2})\z/ ) {
$data = { comment => $1, date => $2, time => $3, AM => $4 };
} else {
$data = { user => $1 , label => $2, desc => $3 } if $col[2] =~ /(\S+) (\d+) (".*?"?)\z/;
$data = { label => $1, desc => $2 } if $col[2] =~ /(\d+) (".*?")\z/;
}
$log{$col[0]} = { perm => $col[1], data => $data };
}
say Dumper(\%log);
__DATA__
(0552) -*--# "<No comment>" 27-Oct-2020 10:40 AM
(0553) M---$ user1 100900 "Random job
(0554) ----# 1119996 "patch content"
(0562) -*--# "<No comment>" 24-Oct-2020 10:40 AM
Output
$VAR1 = {
'0562' => {
'perm' => '-*--#',
'data' => {
'date' => '24-Oct-2020',
'AM' => 'AM',
'comment' => '"<No comment>"',
'time' => '10:40'
}
},
'0553' => {
'perm' => 'M---$',
'data' => {
'desc' => '"Random job',
'user' => 'user1',
'label' => '100900'
}
},
'0552' => {
'perm' => '-*--#',
'data' => {
'date' => '27-Oct-2020',
'time' => '10:40',
'AM' => 'AM',
'comment' => '"<No comment>"'
}
},
'0554' => {
'data' => {
'label' => '1119996',
'desc' => '"patch content"'
},
'perm' => '----#'
}
};

Perl: How get multiple regex captures in a structured way?

I am trying to get all occurences of a group of patterns in an arbitrary string, much like this:
my $STRING = "I have a blue cat. That cat is nice, but also quite old. She is always bored.";
foreach (my #STOPS = $STRING =~ m/(?<FINAL_WORD>\w+)\.\s*(?<FIRST_WORD>\w+)/g ) {
print Dumper \%+, \#STOPS;
}
But the outcome is not what I expected, and I don't fully understand why:
$VAR1 = {
'FINAL_WORD' => 'old',
'FIRST_WORD' => 'She'
};
$VAR2 = [
'cat',
'That',
'old',
'She'
];
$VAR1 = {
'FINAL_WORD' => 'old',
'FIRST_WORD' => 'She'
};
$VAR2 = [
'cat',
'That',
'old',
'She'
];
$VAR1 = {
'FINAL_WORD' => 'old',
'FIRST_WORD' => 'She'
};
$VAR2 = [
'cat',
'That',
'old',
'She'
];
$VAR1 = {
'FINAL_WORD' => 'old',
'FIRST_WORD' => 'She'
};
$VAR2 = [
'cat',
'That',
'old',
'She'
];
If there is no better solution I could live with what is in #STOPS in the end and omit the loop. But I would prefer to get every pair of matches separately, and I don't see a way.
But why then is the loop executed multiple times anyway?
Thank you in advance, and Regards,
Mazze
You need to use a while loop not a for loop:
while ($STRING =~ m/(?<FINAL_WORD>\w+)\.\s*(?<FIRST_WORD>\w+)/g ) {
print Dumper \%+;
}
Output:
$VAR1 = {
'FIRST_WORD' => 'That',
'FINAL_WORD' => 'cat'
};
$VAR1 = {
'FIRST_WORD' => 'She',
'FINAL_WORD' => 'old'
};
The for loop gathers all the matches at once in #STOPS and %+ is set to the last global match. The while loop allows you to iterate through each global match separately.
According to perldoc perlretut:
The modifier /g stands for global matching and allows the matching
operator to match within a string as many times as possible. In scalar
context, successive invocations against a string will have /g jump
from match to match, keeping track of position in the string as it
goes along. You can get or set the position with the pos() function.

Test if a string contains the key of a hash in Perl

I have a string and want to tell if it contains the key of a hash and if it does I would like to print the value of the hash like so:
#!/usr/bin/perl -w
my %h = ( 'key1' => 'v1', 'key2' => 'v2', 'key3' => 'v3' );
my $str = "this is a string containing key1\n";
if ($str contains a key of %h){
print the value of that key; #i.e v1
}
Whats the best way to do this? (Preferably concise enough to contain in an if statement)
#!/bin/perl -w
my %h = ( 'key1' => 'v1', 'key2' => 'v2', 'key3' => 'v3' );
my $str = "this is a string containing key1\n";
while (($key, $value) = each %h) {
if (-1 != index($str, $key)) {
print "$value\n";
}
}
If you have to search through multiple strings but have just the one unchanging hash, it might be faster to compile the hash keys into a regexp upfront, and then apply that regexp to each string.
my %h = ( 'key1' => 'v1', 'key2' => 'v2', 'key3' => 'v3' );
my $hash_keys = qr/${\ join('|', map quotemeta, keys %h) }/;
my #strings = (
"this is a string containing key1\n",
"another string containing key1\n",
"this is a string containing key2\n",
"but this does not\n",
);
foreach my $str (#strings) {
print "$str\n" if $str =~ $hash_keys;
}
In some cases (big hash, keys are words and you don't want them to match sub-words) this could be the right approach:
my %h = ( 'key1' => 'v1', 'key2' => 'v2', 'key3' => 'v3' );
my $str = "this is a string containing key1 and key3 but notkey2, at least not alone\n";
while ($str =~ /(\w+)/g) {
my $v = $h{$1};
print "$v\n" if defined $v;
}

Perl Regex Extract first two section of windows path

I want to write a method to extract first two sections of windows path in Perl.
For example,
'D:\git_root_tfs\WorkStation\Projects\InterACT\Tools\server-rule-checker'
Extract as:
'D:\git_root_tfs\WorkStation'
sub Split_Location_as_VMPath {
my $location = shift;
# ^([d-z]:\\.+?\\.+?)\\
# ^(?:\\.*\\.*)\\
if($location ~~ m/^(?:\\.*\\.*)\\/){ # the path drive start from D to E;
# print "VMPath=$1\n";
# push #$vmPathList, $1;
return Convert_to_Lowercase($1);
}
return "Invalid Path $location";
}
How to write the regex?
Test case:
{
my $item = Split_Location_as_VMPath('D:\VM\ia7-BGCDev8.1\test.vhd');
my $expected = Convert_to_Lowercase('D:\VM\ia7-BGCDev8.1');
ok( $item eq $expected, "Test Split_Location_as_VMPath=$item");
$item = Split_Location_as_VMPath('E:\Hyper-V-2\ia-int-7.1Beta\test.vhd');
$expected = Convert_to_Lowercase('E:\Hyper-V-2\ia-int-7.1Beta');
ok( $item eq $expected, "Test Split_Location_as_VMPath=$item");
$item = Split_Location_as_VMPath('D:\VM\ia7-int-7.1\test.vhd');
$expected = Convert_to_Lowercase('D:\VM\ia7-int-7.1');
ok( $item eq $expected, "Test Split_Location_as_VMPath=$item");
$item = Split_Location_as_VMPath('D:\VM\ia7-int-8.1B153\test.vhd');
$expected = Convert_to_Lowercase('D:\VM\ia7-int-8.1B153');
ok( $item eq $expected, "Test Split_Location_as_VMPath=$item");
$item = Split_Location_as_VMPath('D:\Hyper-v\IA5-SDE-WIN2K3(Feng Tong)\test.vhd');
$expected = Convert_to_Lowercase('D:\Hyper-v\IA5-SDE-WIN2K3(Feng Tong)');
ok( $item eq $expected, "Test Split_Location_as_VMPath=$item");
$item = Split_Location_as_VMPath('D:\git_root_tfs\WorkStation\Projects\InterACT\Tools\server-rule-checker');
$expected = Convert_to_Lowercase('D:\git_root_tfs\WorkStation');
ok( $item eq $expected, "Test Split_Location_as_VMPath=$item");
}
Don't use a regex for file processing.
Instead use a module like File::Spec or Path::Tiny.
use strict;
use warnings;
use File::Spec;
while (<DATA>) {
my ($vol, $dir, $file) = File::Spec->splitpath($_);
my #dirs = File::Spec->splitdir($dir);
#dirs = #dirs[0..2] if #dirs > 3;
$dir = File::Spec->catdir(#dirs);
my $path = File::Spec->catpath($vol, $dir);
print "$path\n";
}
__DATA__
D:\VM\ia7-BGCDev8.1\test.vhd
E:\Hyper-V-2\ia-int-7.1Beta\test.vhd
D:\VM\ia7-int-7.1\test.vhd
D:\VM\ia7-int-8.1B153\test.vhd
D:\Hyper-v\IA5-SDE-WIN2K3(Feng Tong)\test.vhd
D:\git_root_tfs\WorkStation\Projects\InterACT\Tools\server-rule-checker
Outputs:
D:\VM\ia7-BGCDev8.1
E:\Hyper-V-2\ia-int-7.1Beta
D:\VM\ia7-int-7.1
D:\VM\ia7-int-8.1B153
D:\Hyper-v\IA5-SDE-WIN2K3(Feng Tong)
D:\git_root_tfs\WorkStation
Correct regex is ^([d-z]:\.+?\.+?)\.
sub Split_Location_as_VMPath {
my $location = shift;
# ^([d-z]:\\.+?\\.+?)\\
# ^(?:\\.*\\.*)\\
if($location ~~ m/^([D-Z]:\\.+?\\.+?)\\/){ # the path drive start from D to E;
# print "VMPath=$1\n";
# push #$vmPathList, $1;
return Convert_to_Lowercase($1);
}
return "Invalid Path $location";
}
Using regex in this context is an interesting homework for students. Outside school, you should use the standard modules dedicated for this task:
use File::Spec;
sub Split_Location_as_VMPath {
my $location = shift;
my ($volume, $directories, $file) = File::Spec->splitpath($location);
my #dirs = File::Spec->splitdir($directories);
return "Invalid Path $location" unless #dirs > 2;
return lc File::Spec->catpath($volume, File::Spec->catdir(#dirs[0..2]));
}

Perl: Splitting a hash into several based on its keys?

Let's say that I have a hashref whose Data::Dumper output looks like this:
$VAR1 = {
foo_0 => 'foo_zero',
foo_1 => 'foo_one',
bar_0 => 'bar_zero',
bar_1 => 'bar_one'
}
I would like to split this hash into two based on its keys as shown below but I don't know how to do this:
$VAR1 = {
foo_0 => 'foo_zero',
foo_1 => 'foo_one'
},
$VAR2 = {
bar_0 => 'bar_zero',
bar_1 => 'bar_one'
}
The keys of the first hash match /foo_[\d]/ while those of the second hash match /bar_[\d]/.
If you could kindly tell me how to do this (or hint me some search keywords) I would appreciate it.
Regards,
Christopher Smith
The other solutions posted so far work, but are quick and dirty. They need to be changed when the input patterns change, and assume only two patterns. This generalised solution does not suffer from that: it needs no change, and it takes any number of patterns.
sub classify_hashref {
my ($href, %p) = #_;
my $r;
for my $hkey (keys %{ $href }) {
for my $pkey (keys %p) {
$r->{$pkey}{$hkey} = $href->{$hkey}
if $hkey =~ $p{$pkey};
}
}
return $r;
}
my $h = {
foo_0 => 'foo_zero',
foo_1 => 'foo_one',
bar_0 => 'bar_zero',
bar_1 => 'bar_one'
};
classify_hashref($h, foo_like => qr/^foo_/, looks_like_bar => qr/^bar_/);
# {
# looks_like_bar => {
# bar_0 => 'bar_zero',
# bar_1 => 'bar_one'
# },
# foo_like => {
# foo_0 => 'foo_zero',
# foo_1 => 'foo_one'
# }
# }
I'm assuming that your hash reference is $foo_ref. You didn't state what would happen if your hash key is neither a foo or a bar. You could do one of three things:
You have to hash references. One of foo keys and one of all other keys.
You throw out keys that are neither foo keys or bar keys. (This is what I did).
You have a third hash which stores all non-foo and non-bar keys.
The program below:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use Data::Dumper;
my $foo_re = qr/^foo_/;
my $bar_re = qr/^bar_/;
my $foo_ref = {
foo_0 => "foo_zero",
foo_1 => "foo_one",
bar_0 => "bar_zero",
bar_1 => "bar_one",
};
my $bar_ref = {};
foreach my $key (keys %{$foo_ref}) {
if (not $key =~ $foo_re) {
# Remove if clause to store all non-foo keys in $bar_re
$bar_ref->{$key} = $foo_ref->{$key} if $key =~ $bar_re;
delete $foo_ref->{$key}
}
}
say Dumper $foo_ref;
say Dumper $bar_ref;
I'm assuming that all your hash keys has one of the 2 provided patterns. If not, then you should specify more exactly what you have and what you expect.
If you want to process the output of the dump, I'm also assuming it has a correct format suitable for eval. Just put your output inside q( and ):
# ...
my $VAR1;
eval q(
$VAR1 = {
foo_0 => 'foo_zero',
foo_1 => 'foo_one',
bar_0 => 'bar_zero',
bar_1 => 'bar_one'
}
);
my $h1 = {};
my $h2 = {};
for my $k ( keys %{$VAR1} ) {
if ( $k =~ /foo_\d/ ) {
$h1->{$k} = $VAR1->{$k};
next;
}
$h2->{$k} = $VAR1->{$k}; # the remaining /bar_\d/ case
}
# use your new $h1 and $h2 hasrefs
# ...
You will get 2 new hasrefs $h1 and $h2.
If you have other cases besides these 2, you shoul put everyone inside an if, not only the first.
This is not a complete script, just a snippet.