Perl parsing through a file based on conditions

Perl parsing through a file based on conditions - regex

I have a very large log file which is updated periodically. It is as follows:
commands: (List of files to be copied)
Exit time: Fri May 10 05:33:00 2013
Exit status: 2
commands: (List of files to be copied)
Exit Time: Fri May 20 05:34:00 2013
Exit status: 2
commands: (List of files to be copied)
Exit Time: Fri May 30 05:50:00 2013
Exit Status: 1
I have following code which creates a hash based on Exit Status
while ($line = <FH>) {
if ($line =~ /Exit time/) {
($exittime, $exittimeval) = split(': ',$line);
$stat{$qbsid} = {
time => $exittimeval
};
}
I now need to create a timestamp based on localtime such that the script does not compare the log file for the time after the timestamp (localtime). I have the code to compare the time as follows
$date1 = "$hr1:$min1:$sec1, $moy1/$dt1/$yr1";
$date2 = "$hr2:$min2:$sec2, $moy2/$dt2/$yr2";
sub to_comparable {
my ($date) = #_;
my ($H,$M,$S,$d,$m,$Y) = $date =~ m{^(\d+):(\d+):(\d+), (\d+)/(\d+)/(\d+)\z}
or die;
return "$Y$m$d$H$M$S";
}
if (to_comparable($date2) > to_comparable($date1)) {
print "right\n";
} else {
print "wrong \n";
}
Here $hr1,$min1,$sec1,$moy1,$dt1 and $yr1 are local time variables while $hr2,$min2,$sec2, $moy2,$dt2 and $yr2 are values obtained from hash.
Preferably while running for the first time it should compare the whole file and a timestamp is created. Afterwards, the above idea starts.
Please correct me if anything is wrong. Thank you.

You might want to consider using Time::Piece, which was first released with perl v5.9.5.
#!/usr/bin/perl
use strict;
use warnings;
use Time::Piece;
{
my $end_date = '2013-05-30';
local $/ = '';
while (<DATA>) {
if (/^Exit Time: (.+)/m) {
my $date = Time::Piece->strptime($1, "%c");
print $date->ymd, "\n" if $date->ymd lt $end_date;
}
}
}
__DATA__
commands: (List of files to be copied)
Exit Time: Fri May 10 05:33:00 2013
Exit status: 2
commands: (List of files to be copied)
Exit Time: Fri May 20 05:34:00 2013
Exit status: 2
commands: (List of files to be copied)
Exit Time: Fri May 30 05:50:00 2013
Exit Status: 1
Output:
2013-05-10
2013-05-20

You'll be forming a 20-digit number (assuming years are 4-digit and the rest are always exactly 2-digit). That's a big number, but it seems fine on my 64-bit UNIX OS; I don't know about yours. Anyway, with a fixed-length string, you could do a string comparison ("ge" instead of ">") if a number that big is an issue.
If any of the inputs (e.g. $moy1) could be a single digit, then your comparison function will not work since the October 1st (2013101) would be before September 30th (2013930). You could require a fixed number of digits using:
my ($H,$M,$S,$d,$m,$Y) = $date =~ m{^(\d\d):(\d\d):(\d\d), (\d\d)/(\d\d)/(\d\d\d\d)\z}
or die;
I'm sure how $qbsid is set (from Exit Status or something else), but since your code isn't complete I assume you have something else to do that.
I'm also not sure how your original time strings (e.g. "Fri May 30 05:50:00 2013") get transformed into the "$hr1:$min1:$sec1, $moy1/$dt1/$yr1" format, but I assume you do that elsewhere as well.

Related

Trouble getting the right output from a file

Okay so here is my code:Pastebin
What i want to do is read from the file /etc/passwd and extract all the users with an UID over 1000 but less than 65000. With those users i also want to print out how many times they have logged in. And with this current code the output is like this:
user:15
User:4
User:4
The problem with this is that they haven't logged in 15 times or 4 times, because the program is counting every line that is output from the "last" command. So if i run the command "last -l user" it will look something like this:
user pts/0 :0 Mon Feb 15 19:49 - 19:49 (00:00)
user :0 :0 Mon Feb 15 19:49 - 19:49 (00:00)
wtmp begins Tue Jan 26 13:52:13 2016
The part that i'm interested in is the "user :0" line, not the others. And that is why the program outputs the number 4 instead of 1, like it should be. So i came up with a regular expression to only get the part that i need and it looks like this:
\n(\w{1,9})\s+:0
However i cannot get it to work, i only get errors all of the time.
Im hoping someone here might be able to help me.

I think this regexp will do what you want: m/^\w+\s+\:0\s+/
Here's some code that works for me, based on the code you posted... let me know if you have any questions! :)
#!/usr/bin/perl
use Modern::Perl '2009'; # strict, warnings, 'say'
# Get a (read only) filehandle for /etc/passwd
open my $passwd, '<', '/etc/passwd'
or die "Failed to open /etc/passwd for reading: $!";
# Create a hash to store the results in
my %results;
# Loop through the passwd file
while ( my $lines = <$passwd> ) {
my #user_details = split ':', $lines;
my $user_id = $user_details[2];
if ( $user_id >= 1000 && $user_id < 6500 ) {
my $username = $user_details[0];
# Run the 'last' command, store the output in an array
my #last_lines = `last $username`;
# Loop through the output from 'last'
foreach my $line ( #last_lines ) {
if ( $line =~ m/^\w+\s+\:0\s+/ ) {
# Looks like a direct login - increment the login count
$results{ $username }++;
}
}
}
}
# Close the filehandle
close $passwd or die "Failed to close /etc/passwd after reading: $!";
# Loop through the hash keys outputting the direct login count for each username
foreach my $username ( keys %results ) {
say $username, "\t", $results{ $username };
}

The shortest fix for your problem is to run the "last" output through "grep".
my #lastbash = qx(last $_ | grep ' :.* :');

So the answer is to use
my #lastbash = qx(last $_ | grep ":0 *:");
in your code.

Perl manipulation of the cisco switch commands

I have a script which helps me to login to a cisco switch nad run the mac-address table command and save it to an array #ver. The script is as follows:
#!/usr/bin/perl
use strict;
use warnings;
use Net::Telnet::Cisco;
my $host = '192.168.168.10';
my $session = Net::Telnet::Cisco->new(Host => $host, -Prompt=>'/(?m:^[\w.&-]+\s?(?:\(config[^\)]*\))?\s?[\$#>]\s?(?:\(enable\))?\s*$)/');
$session->login(Name => 'admin',Password => 'password');
my #ver = $session->cmd('show mac-address-table dynamic');
for my $line (#ver)
{
print "$line";
if ($line =~ m/^\*\s+\d+\s+(([0-9a-f]{4}[.]){2}[0-9a-f]{4})\s+/ ){
my $mac_addr = $1;
print ("$mac_addr \n");
}
}
$session->close();
It get the following results:
Legend: * - primary entry
age - seconds since last seen
n/a - not available
vlan mac address type learn age ports
------+----------------+--------+-----+----------+--------------------------
* 14 782b.cb87.b085 dynamic Yes 5 Gi4/39
* 400 c0ea.e402.e711 dynamic Yes 5 Gi6/17
* 400 c0ea.e45c.0ecf dynamic Yes 0 Gi11/43
* 400 0050.5677.c0ba dynamic Yes 0 Gi1/27
* 400 c0ea.e400.9f91 dynamic Yes 0 Gi6/3
Now, with the above script I am trying to get the mac address and store it in $mac_addr. But I am not getting the desired results. Please can someone guide me. Thank you.

I'm not clear when you say you're not getting the desired results. I did notice that you are first printing your $line and then printing $mac_addr afterwards, besides that your expression seems to match.
Your regular expression matching your desired data.
If you simply just want the matches, you could do..
for my $line (#ver) {
if (my ($mac_addr) = $line =~ /((?:[0-9a-f]{4}\.){2}[0-9a-f]{4})/) {
print $mac_addr, "\n";
}
}
Output
782b.cb87.b085
c0ea.e402.e711
c0ea.e45c.0ecf
0050.5677.c0ba
c0ea.e400.9f91

If you want to print out the mac addresses, you can do the following:
/^\*/ and print +(split)[2], "\n" for #ver;
Note that this splits the line (implicitly on whitespace) if it begins with *; the mac address is the second element in the resulting list (in case you still need to set $mac_addr).
Hope this helps!

Print remaining lines in file after regular expression that includes variable

I have the following data:
====> START LOG for Background Process: HRBkg Hello on 2013/09/27 23:20:20 Log Level 3 09/27 23:20:20 I Background process is using
processing model #: 3 09/27 23:20:23 I 09/27 23:20:23 I --
Started Import for External Key
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3 09/30 07:31:07 I Background process is using
processing model #: 3 09/30 07:31:09 I 09/30 07:31:09 I --
Started Import for External Key
I need to extract the remaining file contents after the LAST match of ====> START LOG.....
I have tried numerous times to use sed/awk, however, I can not seem to get awk to utilize a variable in my regular expression. The variable I was trying to include was for the date (2013/09/30) since that is what makes the line unique.
I am on an HP-UX machine and can not use grep -A.
Any advice?

There's no need to test for a specific time just to find the last entry in the file:
awk '
BEGIN { ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
NR == FNR { if (/START LOG/) lastMatch=NR; next }
FNR == lastMatch { found=1 }
found
' file

This might work for you (GNU sed):
a=2013/09/30
sed '\|START LOG.*'"$a"'|{h;d};H;$!d;x' file

This will return your desired output.
sed -n '/START LOG/h;/START LOG/!H;$!b;x;p' file
If you have tac available, you could easily do..
tac <file> | sed '/START LOG/q' | tac

Here is one in Python:
#!/usr/bin/python
import sys, re
for fn in sys.argv[1:]:
with open(fn) as f:
m=re.search(r'.*(^====> START LOG.*)',f.read(), re.S | re.M)
if m:
print m.group(1)
Then run:
$ ./re.py /tmp/log.txt
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3
09/30 07:31:07 I Background process is using processing model #: 3
09/30 07:31:09 I
09/30 07:31:09 I -- Started Import for External Key
If you want to exclude the ====> START LOGS.. bit, change the regex to:
r'.*(?:^====> START LOG.*?$\n)(.*)'

For the record, you can easily match a variable against a regular expression in Awk, or vice versa.
awk -v date='2013/09/30' '$0 ~ date {p=1} p' file
This sets p to 1 if the input line matches the date, and prints if p is non-zero.
(Recall that the general form in Awk is condition { actions } where the block of actions is optional; if omitted, the default action is to print the current input line.)

This prints the last START LOG, it set a flag for the last block and print it.
awk 'FNR==NR { if ($0~/^====> START LOG/) f=NR;next} FNR>=f' file file
You can use a variable, but if you have another file with another date, you need to know the date in advance.
var="2013/09/30"
awk '$0~v && /^====> START LOG/ {f=1}f' v="$var" file
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3
09/30 07:31:07 I Background process is using processing model #: 3
09/30 07:31:09 I
09/30 07:31:09 I -- Started Import for External Key

With GNU awk (gawk) or Mikes awk (mawk) you can set the record separator (RS) so that each record will contain a whole log message. So all you need to do is print the last one in the END block:
awk 'END { printf "%s", RS $0 }' RS='====> START LOG' infile
Output:
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3
09/30 07:31:07 I Background process is using processing model #: 3
09/30 07:31:09 I
09/30 07:31:09 I -- Started Import for External Key

Answer in perl:
If your logs are in assume filelog.txt.
my #line;
open (LOG, "<filelog.txt") or "die could not open filelog.tx";
while(<LOG>) {
#line = $_;
}
my $lengthline = $#line;
my #newarray;
my $j=0;
for(my $i= $lengthline ; $i >= 0 ; $i++) {
#newarray[$j] = $line[$i];
if($line[$i] =~ m/^====> START LOG.*/) {
last;
}
$j++;
}
print "#newarray \n";

What is the best way to populate a load file for a date lookup dimension table?

Informix 11.70.TC4:
I have an SQL dimension table which is used for looking up a date (pk_date) and returning another date (plus1, plus2 or plus3_months) to the client, depending on whether the user selects a "1","2" or a "3".
The table schema is as follows:
TABLE date_lookup
(
pk_date DATE,
plus1_months DATE,
plus2_months DATE,
plus3_months DATE
);
UNIQUE INDEX on date_lookup(pk_date);
I have a load file (pipe delimited) containing dates from 01-28-2012 to 03-31-2014.
The following is an example of the load file:
01-28-2012|02-28-2012|03-28-2012|04-28-2012|
01-29-2012|02-29-2012|03-29-2012|04-29-2012|
01-30-2012|02-29-2012|03-30-2012|04-30-2012|
01-31-2012|02-29-2012|03-31-2012|04-30-2012|
...
03-31-2014|04-30-2014|05-31-2014|06-30-2014|
........................................................................................
EDIT : Sir Jonathan's SQL statement using DATE(pk_date + n UNITS MONTH on 11.70.TC5 worked!
I generated a load file with pk_date's from 01-28-2012 to 12-31-2020, and plus1, plus2 & plus3_months NULL. Loaded this into date_lookup table, then executed the update statement below:
UPDATE date_lookup
SET plus1_months = DATE(pk_date + 1 UNITS MONTH),
plus2_months = DATE(pk_date + 2 UNITS MONTH),
plus3_months = DATE(pk_date + 3 UNITS MONTH);
Apparently, DATE() was able to convert pk_date to DATETIME, do the math with TC5's new algorithm, and return the result in DATE format!
.........................................................................................
The rules for this dimension table are:
If pk_date has 31 days in its month and plus1, plus2 or plus3_months only have 28, 29, or 30 days, then let plus1, plus2 or plus3 equal the last day of that month.
If pk_date has 30 days in its month and plus1, plus2 or plus3 has 28 or 29 days in its month, let them equal the last valid date of those month, and so on.
All other dates fall on the same day of the following month.
My question is: What is the best way to automatically generate pk_dates past 03-31-2014 following the above rules? Can I accomplish this with an SQL script, "sed", C program?
EDIT: I mentioned sed because I already have more than two years worth of data and
could perhaps model the rest after this data, or perhaps a tool like awk is better?

The best technique would be to upgrade to 11.70.TC5 (on 32-bit Windows; generally to 11.70.xC5 or later) and use an expression such as:
SELECT DATE(given_date + n UNITS MONTH)
FROM Wherever
...
The DATETIME code was modified between 11.70.xC4 and 11.70.xC5 to generate dates according to the rules you outline when the dates are as described and you use the + n UNITS MONTH or equivalent notation.
This obviates the need for a table at all. Clearly, though, all your clients would also have to be on 11.70.xC5 too.
Maybe you can update your development machine to 11.70.xC5 and then use this property to generate the data for the table on your development machine, and distribute the data to your clients.
If upgrading at least someone to 11.70.xC5 is not an option, then consider the Perl script suggestion.

Can it be done with SQL? Probably, but it would be excruciating. Ditto for C, and I think 'no' is the answer for sed.
However, a couple of dozen lines of perl seems to produce what you need:
#!/usr/bin/perl
use strict;
use warnings;
use DateTime;
my #dates;
# parse arguments
while (my $datep = shift){
my ($m,$d,$y) = split('-', $datep);
push(#dates, DateTime->new(year => $y, month => $m, day => $d))
|| die "Cannot parse date $!\n";
}
open(STDOUT, ">", "output.unl") || die "Unable to create output file.";
my ($date, $end) = #dates;
while( $date < $end ){
my #row = ($date->mdy('-')); # start with pk_date
for my $mth ( qw[ 1 2 3 ] ){
my $fut_d = $date->clone->add(months => $mth);
until (
($fut_d->month == $date->month + $mth
&& $fut_d->year == $date->year) ||
($fut_d->month == $date->month + $mth - 12
&& $fut_d->year > $date->year)
){
$fut_d->subtract(days => 1); # step back until criteria met
}
push(#row, $fut_d->mdy('-'));
}
print STDOUT join("|", #row, "\n");
$date->add(days => 1);
}
Save that as futuredates.pl, chmod +x it and execute like this:
$ futuredates.pl 04-01-2014 12-31-2020
That seems to do the trick for me.

Perl: Deleting multiple reccuring lines where a certain criterion is met

I have data that looks like below, the actual file is thousands of lines long.
Event_time Cease_time
Object_of_reference
-------------------------- --------------------------
----------------------------------------------------------------------------------
Apr 5 2010 5:54PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900
Apr 5 2010 5:55PM Apr 5 2010 6:43PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900
Apr 5 2010 5:58PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 5:58PM Apr 5 2010 6:01PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:01PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:03PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900
Apr 5 2010 6:03PM Apr 5 2010 6:04PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900
Apr 5 2010 6:04PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900
Apr 5 2010 6:03PM Apr 5 2010 6:03PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:03PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:03PM Apr 5 2010 7:01PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
As you can see, each file has a header which describes what the various fields stand for(event start time, event cease time, affected element). The header is followed by a number of dashes.
My issue is that, in the data, you see a number of entries where the cease time is NULL i.e event is still active. All such entries must go i.e for each element where the alarm cease time is NULL, the start time, the cease time(in this case NULL) and the actual element must be deleted from the file.
In the remaining data, all the text starting from word SubNetwork upto BtsSiteMgr= must also go. Along with the headers and the dashes.
Final output should look like below:
Apr 5 2010 5:55PM Apr 5 2010 6:43PM
LUGALAMBO_900
Apr 5 2010 5:58PM Apr 5 2010 6:01PM
BULAGA
Apr 5 2010 6:03PM Apr 5 2010 6:04PM
KAPKWAI_900
Apr 5 2010 6:03PM Apr 5 2010 6:03PM
BULAGA
Apr 5 2010 6:03PM Apr 5 2010 7:01PM
BULAGA
Below is a Perl script that I have written. It has taken care of the headers, the dashes, the NULL entries but I have failed to delete the lines following the NULL entries so as to produce the above output.
#!/usr/bin/perl
use strict;
use warnings;
$^I=".bak" #Backup the file before messing it up.
open (DATAIN,"<george_perl.txt")|| die("can't open datafile: $!"); # Read in the data
open (DATAOUT,">gen_results.txt")|| die("can't open datafile: $!"); #Prepare for the writing
while (<DATAIN>) {
s/Event_time//g;
s/Cease_time//g;
s/Object_of_reference//g;
s/\-//g; #Preceding 4 statements are for cleaning out the headers
my $theline=$_;
if ($theline =~ /NULL/){
next;
next if $theline =~ /SubN/;
}
else{
print DATAOUT $theline;
}
}
close DATAIN;
close DATAOUT;
Kindly help point out any modifications I need to make on the script to make it produce the necessary output.

Your data arrives in sets of 3 lines, so one approach is to organize the parsing that way:
use strict;
use warnings;
# Ignore header junk.
while (<>){
last unless /\S/;
}
until (eof) {
# Read in a set of 3 lines.
my #lines;
push #lines, scalar <> for 1 .. 3;
# Filter and clean.
next if $lines[0] =~ /\sNULL\s/;
$lines[2] =~ s/.+BtsSiteMgr=//;
print #lines[0,2];
}

Looks like a good candidate for a little input record separator ($/) trickery. The idea is to manipulate it so that it deals with one record at a time, rather than the default single line.
use strict;
use warnings;
$^I = '.bak';
open my $dataIn, '<', 'george_perl.txt' or die "Can't open data file: $!";
open my $dataOut, '>', 'gen_results.txt' or die "Can't open output file: $!";
{
local $/ = "\n\t"; # Records have leading tabs
while ( my $record = <$dataIn> ) {
# Skip header & records that contain 'NULL'
next if $record =~ /NULL|Event_time/;
# Strip out the unwanted yik-yak
$record =~ s/SubNetwork.*BtsSiteMgr=//s;
# Print record to output file
print $dataOut $record;
}
}
close $dataIn;
close $dataOut;
Pay attention to the following:
use of the safer three-argument form of open (the two-argument form is what you've shown)
use of scalar variables rather than barewords for defining filehandles
use of the local keyword and extra curlies to modify the definition of $/ only when needed.
the second s in s/SubNetwork.*BtsSitMgr=//s allows matches over multiple lines as well.

s/^.*NULL\r?\n.*\r?\n.*\r?\n//mg;
should filter out the lines that end in NULL plus the two following lines.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl parsing through a file based on conditions - regex

Related

Trouble getting the right output from a file

Perl manipulation of the cisco switch commands

Print remaining lines in file after regular expression that includes variable

What is the best way to populate a load file for a date lookup dimension table?

Perl: Deleting multiple reccuring lines where a certain criterion is met

Categories

Resources