Perl Regular Expression Pattern - regex

I have some data as such :
TYPE: Travel
ADDRESS
Barcelona
Paris
So, address can be 1 or many (I need to discard ADDRESS and get only those cities). For some reason my parsing fails (only "ADDRESS" is printed) to produce the correct result.Am i missing something ?
elsif (/^ADDRESS/) {
my #address_t = split /[no matter what i put,only ADDRESS is printed]+/, $_;
shift #address_t; #is this how i will discard ADDRESS ?
foreach my $address (#address_t) {
#address_names = ($address);
}
I think the regex is suppose to be split a newline, space ?
This is how i processed TYPE:
elsif (/^TYPE/) {
my #type_t = split '\s', $_;
$type = $type_tmp[1];
print "$type" ; #to test, but i have a hashmap which i load them in and print at the end of the file.
Thanks

use warnings;
use strict;
while(<DATA>) {
if (/^ADDRESS/) { # if line contains ADDRESS then read addresses
while (<DATA>) { # ... in a loop
last if !/^ +/; # until we find a non-indented line
print $_; # here you can push $_ to a list
}
}
if ($_ && /^TYPE/) { # a TYPE after address can be processed now
# stuff
}
}
__DATA__
TYPE: Travel
ADDRESS
Barcelona
Paris
TYPE: Travel
ADDRESS
Barcelona
Paris
Produces:
Barcelona
Paris
Barcelona
Paris

Try something like this:
It will print lines if the previous line matches /^ADDRESS/. Let me know if there's a point at which you want to stop, and I can adjust...
use warnings;
use strict;
my $current_line = "";
my $line_count = 0;
while (<IN>){
chomp;
my $previous_line = $current_line;
$current_line = $_;
if ($previous_line =~ /^ADDRESS/ or $line_count > 0 ){
$line_count++;
print "$current_line\n"
}
}

Related

Perl Regex - Getting Text Before and After Match

I am parsing a tab delimited file line by line:
Root rootrank 1 Bacteria domain .72 Firmicutes phylum 1 Clostridia class 1 etc.
=
while (my $line = <$fh>) {
chomp($line);
}
On every line, I want to capture the 1st entry before and after a particular match. For example, for the match phylum, I want to capture the entries Firmicutes and 1. For the match domain, I want to capture the entries Bacteria and .72. How would I write the regex to do this?
Sidenote: I can't simply split the line by tab into an array and use the index because sometimes a category is missing or there are extra categories, and that causes the entries to be shifted by one or two indices. And I want to avoid writing blocks of if statements.
You can still split the input, then map the words to indices, and use than use the indices corresponding to the matches to extract the neighbouring cells:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #matches = qw( phylum domain );
while (<>) {
chomp;
my #cells = split /\t/;
my %indices;
#indices{ #cells } = 0 .. $#cells;
for my $match (#matches) {
if (defined( my $index = $indices{$match} )) {
say join "\t", #cells[ $index - 1 .. $index + 1 ];
}
}
}
What's missing:
You should handle the case when $index == 0 or $index == $#cells.
You should handle the case where some words are repeated in one line.
my $file = "file2.txt";
open my $fh, '<', $file or die "Unable to Open the file $file for reading: $!\n";
while (my $line = <$fh>) {
chomp $line;
while ($line =~ /(\w+)\s+(\w+)\s+(\.?\d+)/g) {
my ($before, $match, $after) = ($1, $2, $3);
print "Before: $before Match: $match After: $after\n";
}
}
You can just simply use the following regex to capture the words before and after of a matched word:
(?<LSH>[\w.]+)[\s\t](?<MATCH>.*?)[\s\t](?<RHS>[\w.]+)
see demo / explanation
You could do:
#!/usr/bin/perl
use Modern::Perl;
my #words = qw(phylum domain);
while(<DATA>) {
chomp;
for my $word (#words) {
my ($before, $after) = $_ =~ /(\S+)(?:\t\Q$word\E\t)(\S+)/i;
say "word: $word\tbefore: $before\tafter: $after";
}
}
__DATA__
Root rootrank 1 Bacteria domain .72 Firmicutes phylum 1 Clostridia class 1 etc.
Output:
word: phylum before: Firmicutes after: 1
word: domain before: Bacteria after: .72

Perl create variables from substring of another variable

I am sure this can be done with split(), but I am more interested in doing it with s// if possible. I want to compare a supplied IP address with an array of IP addresses and find a match if existing. I also want to consider a partial match successful only if the entire element (not a substring of the array element) is a match.
For example: Supplied IP: 10.12.13.14
If the current array element is 10.12. or 10. or 10.12.13. We can consider that a match, but not 10.12.11.
This is to find if a given IP exists in the hosts.allow TCP wrappers file on a Linux host. I will add functionality to append the address if it is not covered in the file. Since Partial subnet matches like 10.120. or 192.168. work, I need to test for those as well. That is the code I am missing below where the placeholder "OR SUBSTRING MATCHES" exists. I want to know if my $IP = "1.2.3.4"; how do I make substring variables so I can perform a string comparison on "1.2.3." and "1.2." ?
#PSEUDO CODE EXAMPLE
my #IPS = (10.12.13.14, 191.168.1.2, 10.8., 172.16. );
my $IP = "10.8.3.44";
foreach (#IPS) { if( $IP eq $_ || split(/\d+\./, 1-3, $IP) eq $_ ) { print $IP matches current IP: $_\n}
# That split is supposed to represent "10." "10.8." and "10.8.3." That is the logic I am trying to accomplish, but I would like to use s// if it fits the job, otherwise I am open to split() or other suggestions
#REAL CODE EXAMPLE
#!/usr/bin/perl
my $IP = $ARGV[0];
my $FILE = '/etc/hosts.allow';
# Make sure it is an IP with either 157. or 140. as first octet
unless ( $IP =~ qr/(140|157)\.(\d{1,3}\.){2}\d{1,3}/ ) {
die "Usage: $0 IP Address" } else {
open (FH, "<", "$FILE");
foreach $LINE (<FH>) {
if ( $LINE =~ qr/^sshd: (.*)/i ) {
#LIST = split(", ", $1);
foreach (#LIST) {
chomp $_;
if($IP eq $_) || (OR SUBSTRING MATCHES ) <-need code here {
print "IP ADDRESS: $IP found! \n";
} else { print "$_ is not a match\n"};
}
}
}
}
Why reinvent the wheel?
use strict;
use warnings;
use feature qw/say/;
use Net::Subnet;
my $allowed_hosts = subnet_matcher qw(
10.8.0.0/16
10.12.13.14/32
191.168.1.2/32
172.16.0.0/16
);
for my $ip (qw/10.8.3.44/) {
if ($allowed_hosts->($ip)) {
say "$ip is allowed!";
}
else {
say "$ip is disallowed!";
}
}
You can build a regular expression to match against your accepted list. As M42 already demonstrated, you need to use quotemeta so that your period's aren't treated as the any character. You also need to be careful about your boundary conditions as well:
my #ips = qw(10.12.13.14 191.168.1.2 10.8. 172.16.);
my $ips_list = join '|', map {/\d$/ ? "$_\$" : $_} map quotemeta, #ips;
my $ips_re = qr{^(?:$ips_list)};
while (<DATA>) {
chomp;
if ($_ =~ $ips_re) {
print "(pass) $_\n";
} else {
print "(fail) $_\n";
}
}
__DATA__
10.8.3.44
999.10.8.999
10.12.13.14999
10.12.13.14
172.16.99.99
191.168.1.2
191.168.1.29
Outputs:
(pass) 10.8.3.44
(fail) 999.10.8.999
(fail) 10.12.13.14999
(pass) 10.12.13.14
(pass) 172.16.99.99
(pass) 191.168.1.2
(fail) 191.168.1.29
How about:
if ( ($IP eq $_) || ($IP =~ /^\Q$_/) ) {

Perl Text Extraction

I just need to extract the numbers in each of these items and store them separately, whats the best way to do this ?
IF the data is something like
p °c 4'i
App data usage stats
E
iii
! 12:12PM
Received data
Sent data
Current usage
598KB
28KB
626KB :18%
Get Current Stat Browser App
J
Battery Level
I tried this, but I get only 18 as an output in this case.
foreach my $line (#lines) {
if ($line =~/ :[ ]*(\d+)[ ]*(KB|%)/) {
$value = $1;
print "the value is $value\n";
push (#array, $1);
}
}
Thanks,
Loop over every line, and using a regular expression
foreach my $line (#lines) {
if ($line =~ /(\d+)/) {
push (#array, $1);
}
}
And you'll have all the numbers in your #array array
Here's one way to do it. Note that it does not care about which kind of numbers it extracts, as per your request.
It splits the line on colons in max two fields, key and value. Then we extract numbers from the values and insert into the hash. This part will effectively skip all lines where values do not contain numbers. This is also where you would insert stricter checks, e.g. if ($value =~ /(\d+)\s*KB/i) would only capture numbers followed by KB (I opted to add case insensitivity).
use strict;
use warnings;
use Data::Dumper;
my %hash;
while (<DATA>) {
my ($key, $value) = split /\s*:\s*/, $_, 2;
if ($value =~ /(\d+)/) {
$hash{$key} = $1;
}
}
print Dumper \%hash;
__DATA__
Received data : 598 KB
Sent data : 28 KB
Current usage : 626 KB
Battery Level : 35 %
Output:
$VAR1 = {
'Sent data' => '28',
'Current usage' => '626',
'Battery Level' => '35',
'Received data' => '598'
};

Not able to extract regex matches, return "1" instead of string

I am seeing strange problem..can someplease please help.
I have a log template that looks like this
CPU load: 0
Memory load: 7
User load: 0
Interface Information:
eth0: Up
eth1: Up
Processes Information:
Now, I login to my device and get the logs like
my #output = $ssh->exec("show details");
The output looks similar, as show below but different values for parameters
CPU load: 21
Memory load: 27
User load: 21
Interface Information:
eth0: Down
eth1: Up
Processes Information:
First I am opening the template file, split it into line by line and when I try to
compate it with "show details" output, for the matches, I am getting value 1 as result
and not the matched string. Can someone please help.
Code:
my #output = $ssh->exec("show details");
open (FH, "templates/SHOW.txt") || die "Could not open File: $!\n";
#file_array = <FH>;
#TemplateArray = split(/\n/,#file_array);
#matches = split(/\n/,#output);
foreach $keys (#matches) {
foreach (#TemplateArray) {
$keys =~ m/($_)/;
unshift (#result_array, $1);
}
}
print "\n #result_array\n";
}
I get "1" as result but no string.
When you use split on an array, the array will be in scalar context, and will only return the number of elements in it. In other words:
#TemplateArray = split(/\n/,#file_array);
#matches = split(/\n/,#output);
is equal to:
#TemplateArray = $#file_array;
#matches = $#output;
Which is why you get "1" as a result.
Also, if you are not already doing it:
use strict;
use warnings;
Adding to TLP's answer, the solution is to change
#matches = split(/\n/,#output);
to
#matches = map { split(/\n/, $_) } #output;
so split() operates on strings from #output.
split's expects a string for its second argument, so
#TemplateArray = split(/\n/, #file_array);
ends up being the same as
#TemplateArray = split(/\n/, scalar(#file_array));
Keep in mind that scalar(#file_array) returns the number of elements in the array.
#file_array = <FH>;
will populate #file_array as follows:
#file_array = (
"line1\n",
"line2\n",
"line3\n",
);
In other words, it's already split into lines. If you're trying to remove the trailing newlines, you want to replace
#TemplateArray = split(/\n/,#file_array);
with
chomp( my #TemplateArray = #file_array );
I can't help you fix
#matches = split(/\n/,#output);
because I don't know what $ssh contains, and thus I don't know what #output contains.
Please use
use strict;
use warnings;

Parsing input to get specific values

I have input like this:
"[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|phone]0|0|{A=145,B=2,C=12,D=18}|!0|0|{A=167,B=2,C=67,D=17}|.1iit196|0|{A=244,B=6,C=67,D=12}|10:48AM204|0|{A=9,B=201,C=61,D=11}|Calculator66|0|{A=145,B=450,C=49,D=14}|phone";
It appears as a continuous line, there are no line breaks. I need the
largest value out of the values between [ and the first occurrence of
|. In this case, for example, the largest value is 204. Once
that is obtained, I want to print the contents of that element
between []. In this case, it would be "204|0|{A=9,B=201,C=61,D=11}|Calculator".
I've tried something like this, but it is not going anywhere:
my #array1;
my $data = "[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=1
+7}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,
+D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|phone]0|0|{A=145,B=2,C
+=12,D=18}|!0|0|{A=167,B=2,C=67,D=17}|.1iit196|0|{A=244,B=6,C=67,D=12}
+|10:48AM204|0|{A=9,B=201,C=61,D=11}|Calculator66|0|{A=145,B=450,C=49,
+D=14}|phone";
my $high = 0;
my #values = split(/\[([^\]]+)\]/,$data) ;
print "Values is #values \n";
foreach (#values) {
# I want the value that preceeds the first occurence of | in each array
# element, i.e. 0,0,196,204, etc.
my ($conf,$rest)= split(/\|/,$_);
print "Conf is $conf \n";
print "Rest is $rest \n";
push(#array1, $conf);
push (#array2, $rest);
print "Array 1 is #array1 \n";
print "Array 2 is #array2 \n";
}
$conf = highest(#array1);
my $i=0;
# I want the index value of the element that contains the highest conf value,
# in this case 204.
for (#myarray1) { last if $conf eq $_; $i++; };
print "$conf=$i\n";
# I want to print the rest of the string that was split in the same index
# position.
$rest = #array2[$i];
print "Rest is $rest \n";
# To get the highest conf value
sub highest {
my #data = #_;
my $high = 0;
for(#data) {
$high = $_ if $_ > $high;
}
$high;
}
Maybe I should be using a different approach. Could someone help me, please?
One way of doing it:
#!/usr/bin/perl
use strict;
my $s = "[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|phone]";
my #parts = split(/\]/, $s);
my $max = 0;
my $data = "";
foreach my $part (#parts) {
if ($part =~ /\[(\d+)/) {
if ($1 > $max) {
$max = $1;
$data = substr($part, 1);
}
}
}
print $data."\n";
A couple of notes:
you can split your original string by \], so you get parts like [0|0|{A=145,B=2,C=12,D=18}|!
then you parse each part to get the integer after the initial [
the rest it's easy: keep track of the biggest integer and of the corresponding part, and output it at the end.
In shell script:
#!/bin/bash
MAXVAL=$(cat /tmp/data | tr [ "\\n" | cut -d"|" -f1 | sort -n | tail -1)
cat /tmp/data | tr [] "\\n" | grep ^$MAXVAL
The first line cuts your big mass of data into lines, extracts just the first field, sorts it and takes the max. The second line cuts the data into lines again and greps for that max val.
If you have a LOT of data, this could be slow, so you could put the "lined" data into a temp file or something.
split() is the Right Tool when you know what you want to throw away. Capturing or m//g is the Right Tool when you know what you want to keep. (paraphrased from a Randal Schwartz quote).
You want to specify what to keep (between square brackets) rather than what to throw away (nothing!).
Luckily, your data is "hash shaped" (ie. alternating keys and values), so load it into a hash, sort the keys, and output the value for the highest key:
my %data = $data =~ /\[
(\d+) # digits are the keys
([^]]+) # rest are the values
\]/gx;
my($highest) = sort {$b <=> $a} keys %data; # inefficent if $data is big
print $highest, $data{$highest}, "\n";
Another way of doing this :
#!/usr/bin/perl
use strict;
my $str = '[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|phone]0|0|{A=145,B=2,C=12,D=18}|!0|0|{A=167,B=2,C=67,D=17}|.1iit196|0|{A=244,B=6,C=67,D=12}|10:48AM204|0|{A=9,B=201,C=61,D=11}|Calculator66|0|{A=145,B=450,C=49,D=14}|phone';
my $maxval = 0;
my $pattern;
while ( $str =~ /(\[(\d+)\|.+?\])/g)
{
if ( $maxval < $2 ) {
$maxval = $2;
$pattern = $1;
}
}
print "Maximum value = $maxval and the associate pattern = $pattern \n";
# In this example $maxvalue = 204
# and $pattern = [204|0|{A=9,B=201,C=61,D=11}|Calculator]