Perl: Deleting multiple reccuring lines where a certain criterion is met - regex

I have data that looks like below, the actual file is thousands of lines long.
Event_time Cease_time
Object_of_reference
-------------------------- --------------------------
----------------------------------------------------------------------------------
Apr 5 2010 5:54PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900
Apr 5 2010 5:55PM Apr 5 2010 6:43PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900
Apr 5 2010 5:58PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 5:58PM Apr 5 2010 6:01PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:01PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:03PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900
Apr 5 2010 6:03PM Apr 5 2010 6:04PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900
Apr 5 2010 6:04PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900
Apr 5 2010 6:03PM Apr 5 2010 6:03PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:03PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:03PM Apr 5 2010 7:01PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
As you can see, each file has a header which describes what the various fields stand for(event start time, event cease time, affected element). The header is followed by a number of dashes.
My issue is that, in the data, you see a number of entries where the cease time is NULL i.e event is still active. All such entries must go i.e for each element where the alarm cease time is NULL, the start time, the cease time(in this case NULL) and the actual element must be deleted from the file.
In the remaining data, all the text starting from word SubNetwork upto BtsSiteMgr= must also go. Along with the headers and the dashes.
Final output should look like below:
Apr 5 2010 5:55PM Apr 5 2010 6:43PM
LUGALAMBO_900
Apr 5 2010 5:58PM Apr 5 2010 6:01PM
BULAGA
Apr 5 2010 6:03PM Apr 5 2010 6:04PM
KAPKWAI_900
Apr 5 2010 6:03PM Apr 5 2010 6:03PM
BULAGA
Apr 5 2010 6:03PM Apr 5 2010 7:01PM
BULAGA
Below is a Perl script that I have written. It has taken care of the headers, the dashes, the NULL entries but I have failed to delete the lines following the NULL entries so as to produce the above output.
#!/usr/bin/perl
use strict;
use warnings;
$^I=".bak" #Backup the file before messing it up.
open (DATAIN,"<george_perl.txt")|| die("can't open datafile: $!"); # Read in the data
open (DATAOUT,">gen_results.txt")|| die("can't open datafile: $!"); #Prepare for the writing
while (<DATAIN>) {
s/Event_time//g;
s/Cease_time//g;
s/Object_of_reference//g;
s/\-//g; #Preceding 4 statements are for cleaning out the headers
my $theline=$_;
if ($theline =~ /NULL/){
next;
next if $theline =~ /SubN/;
}
else{
print DATAOUT $theline;
}
}
close DATAIN;
close DATAOUT;
Kindly help point out any modifications I need to make on the script to make it produce the necessary output.

Your data arrives in sets of 3 lines, so one approach is to organize the parsing that way:
use strict;
use warnings;
# Ignore header junk.
while (<>){
last unless /\S/;
}
until (eof) {
# Read in a set of 3 lines.
my #lines;
push #lines, scalar <> for 1 .. 3;
# Filter and clean.
next if $lines[0] =~ /\sNULL\s/;
$lines[2] =~ s/.+BtsSiteMgr=//;
print #lines[0,2];
}

Looks like a good candidate for a little input record separator ($/) trickery. The idea is to manipulate it so that it deals with one record at a time, rather than the default single line.
use strict;
use warnings;
$^I = '.bak';
open my $dataIn, '<', 'george_perl.txt' or die "Can't open data file: $!";
open my $dataOut, '>', 'gen_results.txt' or die "Can't open output file: $!";
{
local $/ = "\n\t"; # Records have leading tabs
while ( my $record = <$dataIn> ) {
# Skip header & records that contain 'NULL'
next if $record =~ /NULL|Event_time/;
# Strip out the unwanted yik-yak
$record =~ s/SubNetwork.*BtsSiteMgr=//s;
# Print record to output file
print $dataOut $record;
}
}
close $dataIn;
close $dataOut;
Pay attention to the following:
use of the safer three-argument form of open (the two-argument form is what you've shown)
use of scalar variables rather than barewords for defining filehandles
use of the local keyword and extra curlies to modify the definition of $/ only when needed.
the second s in s/SubNetwork.*BtsSitMgr=//s allows matches over multiple lines as well.

s/^.*NULL\r?\n.*\r?\n.*\r?\n//mg;
should filter out the lines that end in NULL plus the two following lines.

Related

Ansible: ios upgrade router: check "spacefree_kb" prior to image copy

I'm writing a playbook for ios upgrade of multiple switches and have most pieces working with exception of the flash free check. Basically, I want to check if there is enough flash space free prior to copying the image.
I tried using the gather facts module but it is not working how I expected:
from gather facts I see this:
"ansible_net_filesystems_info": {
"flash:": {
"spacefree_kb": 37492,
"spacetotal_kb": 56574
This is the check I want to do:
fail:
msg: 'This device does not have enough flash memory to proceed.'
when: "ansible_net_filesystems_info | json_query('*.spacefree_kb')|int < new_ios_filesize|int"
From doing some research I understand that any value returned by a jinja2 template will be a string so my check is failing:
Pass integer variable to task without losing the integer type
The solution suggested in the link doesn't seem to work for me even with ansible 2.7.
I then resorted to store the results of 'dir' in a register and tried using regex_search but can't seem to get the syntax right.
(similar to this :
Ansible regex_findall multiple strings)
"stdout_lines": [
[
"Directory of flash:/",
"",
" 2 -rwx 785 Jul 2 2019 15:39:05 +00:00 dhcp-snooping.db",
" 3 -rwx 1944 Jul 28 2018 20:05:20 +00:00 vlan.dat",
" 4 -rwx 3096 Jul 2 2019 01:03:26 +00:00 multiple-fs",
" 5 -rwx 1915 Jul 2 2019 01:03:26 +00:00 private-config.text",
" 7 -rwx 35800 Jul 2 2019 01:03:25 +00:00 config.text",
" 8 drwx 512 Apr 25 2015 00:03:16 +00:00 c2960s-universalk9-mz.150-2.SE7",
" 622 drwx 512 Apr 25 2015 00:03:17 +00:00 dc_profile_dir",
"",
"57931776 bytes total (38391808 bytes free)"
]
]
Can anyone provide some insight to this seemingly simple task? I just want '38391808' as an integer from the example above (or any other suggestion). I'm fairly new to ansible.
Thanks in advance.
json_query wildcard expressions return a list. The tasks below
- set_fact:
free_space: "{{ ansible_net_filesystems_info|
json_query('*.spacefree_kb') }}"
- debug:
var: free_space
give the list
"free_space": [
37492
]
which neither can be converted to an integer nor can be compared to an integer. This is the reason for the problem.
The solution is simple. Just take the first element of the list and the condition will start working
- fail:
msg: 'This device does not have enough flash memory to proceed.'
when: ansible_net_filesystems_info|
json_query('*.spacefree_kb')|
first|
int < new_ios_filesize|int
Moreover, json_query is not necessary. The attribute spacefree_kb can be referenced directly
- fail:
msg: 'This device does not have enough flash memory to proceed.'
when: ansible_net_filesystems_info['flash:'].spacefree_kb|
int < new_ios_filesize|int
json_query has an advantage : see this example on a C9500 :
[{'bootflash:': {'spacetotal_kb': 10986424.0, 'spacefree_kb': 4391116.0}}]
yes they changed flash: to bootflash:.

"\S+?#\S+" and "\S+#\S+" give same output in regular expressions

The following two regularexpressions give the same output in python 3.7.
"+?" is supposed to be non-greedy
re.findall("\S+?#\S+?","From stephen.marquard#uct.ac.za Sat Jan 5 09:14:16 2008")
re.findall("\S+#\S+","From stephen.marquard#uct.ac.za Sat Jan 5 09:14:16 2008")
Both of these give the same output as:
['stephen.marquard#uct.ac.za']
The +? is non-greedy as expected. Refer https://regex101.com/r/DM4voj/1
If you copy pasted both the commands into your shell or program, you will only get the output of the last command. Try using print statements for both. You should get the desired answers.
import re
print(re.findall("\S+?#\S+?","From stephen.marquard#uct.ac.za Sat Jan 5 09:14:16 2008"))
print(re.findall("\S+#\S+","From stephen.marquard#uct.ac.za Sat Jan 5 09:14:16 2008"))
The result will be as below as expected
['stephen.marquard#u']
['stephen.marquard#uct.ac.za']

Groovy String replacement with link

I have multi-lines string from git log in variable
and want to replace matched lines with hyper-links
but keep some parts of the original string with Groovy.
Example:
commit 7a1825abc69f1b40fd8eb3b501813f21e09bfb54
Author: Filip Stefanov
Date: Mon Nov 21 11:05:08 2016 +0200
TICKET-1
Test change
Change-Id: I7b4028e504de6c4a48fc34635d4b94ad038811a6
Should look like:
commit 7a1825abc69f1b40fd8eb3b501813f21e09bfb54
Author: Filip Stefanov
Date: Mon Nov 21 11:05:08 2016 +0200
<a href=http://localhost:8080/browse/TICKET-1>TICKET-1</a>
Test change
<a href=http://localhost:8081/#/q/I7b4028e504de6c4a48fc34635d4b94ad038811a6,n,z>Change-Id: I7b4028e504de6c4a48fc34635d4b94ad038811a6</a>
Im pretty bad in Groovy regex dont know how to use grouping or closures so far so good:
mystring.replaceAll(/TICKET-/, "http://localhost:8080/browse/TICKET-")
NOTE:
TICKET {int} and Change-Id {hash} are variables
mystring.replaceAll(/(TICKET-\d++)/, '$1')
.replaceAll(/Change-Id: (I\p{XDigit}++)/, 'Change-Id: $1')
Of course you have to replace the dynamic parts accordingly. Currently it is at least one digit after the TICKET- and an I and then at least one hex digit after the Change-ID:.

Perl parsing through a file based on conditions

I have a very large log file which is updated periodically. It is as follows:
commands: (List of files to be copied)
Exit time: Fri May 10 05:33:00 2013
Exit status: 2
commands: (List of files to be copied)
Exit Time: Fri May 20 05:34:00 2013
Exit status: 2
commands: (List of files to be copied)
Exit Time: Fri May 30 05:50:00 2013
Exit Status: 1
I have following code which creates a hash based on Exit Status
while ($line = <FH>) {
if ($line =~ /Exit time/) {
($exittime, $exittimeval) = split(': ',$line);
$stat{$qbsid} = {
time => $exittimeval
};
}
I now need to create a timestamp based on localtime such that the script does not compare the log file for the time after the timestamp (localtime). I have the code to compare the time as follows
$date1 = "$hr1:$min1:$sec1, $moy1/$dt1/$yr1";
$date2 = "$hr2:$min2:$sec2, $moy2/$dt2/$yr2";
sub to_comparable {
my ($date) = #_;
my ($H,$M,$S,$d,$m,$Y) = $date =~ m{^(\d+):(\d+):(\d+), (\d+)/(\d+)/(\d+)\z}
or die;
return "$Y$m$d$H$M$S";
}
if (to_comparable($date2) > to_comparable($date1)) {
print "right\n";
} else {
print "wrong \n";
}
Here $hr1,$min1,$sec1,$moy1,$dt1 and $yr1 are local time variables while $hr2,$min2,$sec2, $moy2,$dt2 and $yr2 are values obtained from hash.
Preferably while running for the first time it should compare the whole file and a timestamp is created. Afterwards, the above idea starts.
Please correct me if anything is wrong. Thank you.
You might want to consider using Time::Piece, which was first released with perl v5.9.5.
#!/usr/bin/perl
use strict;
use warnings;
use Time::Piece;
{
my $end_date = '2013-05-30';
local $/ = '';
while (<DATA>) {
if (/^Exit Time: (.+)/m) {
my $date = Time::Piece->strptime($1, "%c");
print $date->ymd, "\n" if $date->ymd lt $end_date;
}
}
}
__DATA__
commands: (List of files to be copied)
Exit Time: Fri May 10 05:33:00 2013
Exit status: 2
commands: (List of files to be copied)
Exit Time: Fri May 20 05:34:00 2013
Exit status: 2
commands: (List of files to be copied)
Exit Time: Fri May 30 05:50:00 2013
Exit Status: 1
Output:
2013-05-10
2013-05-20
You'll be forming a 20-digit number (assuming years are 4-digit and the rest are always exactly 2-digit). That's a big number, but it seems fine on my 64-bit UNIX OS; I don't know about yours. Anyway, with a fixed-length string, you could do a string comparison ("ge" instead of ">") if a number that big is an issue.
If any of the inputs (e.g. $moy1) could be a single digit, then your comparison function will not work since the October 1st (2013101) would be before September 30th (2013930). You could require a fixed number of digits using:
my ($H,$M,$S,$d,$m,$Y) = $date =~ m{^(\d\d):(\d\d):(\d\d), (\d\d)/(\d\d)/(\d\d\d\d)\z}
or die;
I'm sure how $qbsid is set (from Exit Status or something else), but since your code isn't complete I assume you have something else to do that.
I'm also not sure how your original time strings (e.g. "Fri May 30 05:50:00 2013") get transformed into the "$hr1:$min1:$sec1, $moy1/$dt1/$yr1" format, but I assume you do that elsewhere as well.

Storing strings in an array

I am currently storing below lines in a file named google.txt. I want to seperate these lines and store those seperated strings in arrays.
Like for first line
#qf_file= q33AgCEv006441
#date = Tue Apr 3 16:12
#junk_message = User unknown
#rf_number = ngandotra#nkn.in
the line ends at the #rf_number at last emailadress
q33AgCEv006441 1038 Tue Apr 3 16:12 <test10-list-bounces#lsmgr.nic.in>
(User unknown)
<ngandotra#nkn.in>
q33BDrP9007220 50153 Tue Apr 3 16:43 <karuvoolam-list-bounces#lsmgr.nic.in>
(Deferred: 451 4.2.1 mailbox temporarily disabled: paond.tndt)
<paond.tndta#nic.in>
q33BDrPB007220 50153 Tue Apr 3 16:43 <karuvoolam-list-bounces#lsmgr.nic.in>
(User unknown)
paocorp.tndta#nic.in>
<dtocbe#tn.nic.in>
<dtodgl#nic.in>
q33BDrPA007220 50153 Tue Apr 3 16:43 <karuvoolam-list-bounces#lsmgr.nic.in>
(User unknown)
<dtokar#nic.in>
<dtocbe#nic.in>
q2VDWKkY010407 2221878 Sat Mar 31 19:37 <dhc-list-bounces#lsmgr.nic.in>
(host map: lookup (now-india.net.in): deferred)
<arjunpan#now-india.net.in>
q2VDWKkR010407 2221878 Sat Mar 31 19:31 <dhc-list-bounces#lsmgr.nic.in>
(host map: lookup (aaplawoffices.in): deferred)
<amit.bhagat#aaplawoffices.in>
q2U8qZM7026999 360205 Fri Mar 30 14:38 <dhc-list-bounces#lsmgr.nic.in>
(host map: lookup (now-india.net.in): deferred)
<arjunpan#now-india.net.in>
<amit.bhagat#aaplawoffices.in>
q2TEWWE4013920 2175270 Thu Mar 29 20:30 <dhc-list-bounces#lsmgr.nic.in>
(host map: lookup (now-india.net.in): deferred)
<arjunpan#now-india.net.in>
<amit.bhagat#aaplawoffices.in>
Untested Perl script:
Let's call this script parser.pl:
$file = shift;
open(IN, "<$file") or die "Cannot open file: $file for reading ($!)\n";
while(<IN>) {
push(#qf_file, /^\w+/g);
push(#date, /(?:Sat|Sun|Mon|Tue|Wed|Thu|Fri)[\w\s:]+/g);
push(#junk_message, /(?<=\().+(?=\)\s*<)/g);
push(#rf_number, /(?<=<)[^>]+(?=>\s*$)/g);
}
close(IN);
This assumes the last email on the line should be the "rf_number" for that line. Note that emails may be tricky to print, as they have an # character, and perl is more than happy to print a non-existent list for you :-)
To call this in a command line:
parser.pl google.txt
See this working here.