Perl and Regex - Parsing values from a .csv - regex

I need to create a perl script that reads the last modified file in a given folder (the file is always a .csv) and parses the values from their columns, so I can control them to a mysql database.
The main problem is: I need to separate the Date from the Hours, and the Country from the Names(CHN, DEU and JPN represent China, Deutschland and Japan).
They come together like in the example below:
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"
So far I can split the lines, but how can I make it understand that each value into "" and separated by , should be inserted into my arrays?
my %date;
my %hour;
my %country;
my %name;
my %percentage_one;
my %percentage_two;
# Selects lastest file in the given directory
my $files = File::DirList::list('/home/cvna/IN/SCRIPTS/zabbix/roaming/tratamento_IAS/GPRS_IN', 'M');
my $file = $files->[0]->[13];
open(CONFIG_FILE,$file);
while (<CONFIG_FILE>){
# Splits the file into various lines
#lines = split(/\n/,$_);
# For each line that i get...
foreach my $line (#lines){
# I need to split the values between , without the ""
# And separating Hour from Date, and Name from Country
#aux = split(/......./,$line)
}
}
close(CONFIG_FILE);

readline or <> only reads one line. There's no need to split it on newlines. But, instead of fixing your code, use Text::CSV:
#!/usr/bin/perl
use 5.010;
use warnings;
use strict;
use Text::CSV;
my $csv = 'Text::CSV'->new({ binary => 1 }) or die 'Text::CSV'->error_diag;
while (my $row = $csv->getline(*DATA)) {
my ($date, $time) = split / /, $row->[0];
my ($country, $name) = split / - /, $row->[3];
print "Date: $date\tTime: $time\tCountry: $country\tName: $name\n";
}
__DATA__
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"

Looking at your code, it appears you're pretty new to Perl. The Text::CSV module is a nice solution, but unfortunately, isn't a standard module. You'll need to use CPAN to install it. It isn't difficult, but may require you to be the administrator of your computer.
The module Text::ParseWords is a standard module and can handle quoted words much like Text::CSV can.
You'll need to basically split the line (which I do with the parse_linefunction). The first parameter is , which is what I want to split my line upon. Unlike split itself, parse_line doesn't split on the parameters that are quoted, and handles backticked quotes. This is very similar to Text::CSV.
Once you've split your line, you'll need to split date from time and country from name. In my example, I show two ways of doing this: One uses split and the other uses a matching regular expression. Either one will work.
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use feature qw(say); # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;
while ( my $line = <DATA> ) {
my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)
= parse_line ',', 0, $line;
my ($date, $time) = split /\s+/, $date_time;
my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
say "$date, $time, $country, $name";
}
__DATA__
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"
In your actual program, you'll open your file, and make sure you've opened that file. You can test for that, or use autodie:
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use feature qw(say); # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;
use autodie;
open my $config_file, "<", $file; # No need for testing thanks to use autodie!
# What you need to do if you don't use autodie
# open my $config_file, "<", $file or die qq(Can't open "$file" for reading);
while ( my $line = <$config_file> ) {
my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)
= parse_line ',', 0, $line;
my ($date, $time) = split /\s+/, $date_time;
my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
say "$date, $time, $country, $name"; # Show fields were correctly parsed.
}
It looks like you want to store the data, I see you have multiple hashes that I bet you're trying to keep in parallel. Take a look at how you can use references that allows you to build more complex structures:
my %data; #Where I'll be storing the data...
$data{$key}->{DATE} = $date;
$data{$key}->{HOUR} = $hour;
$data{$key}->{COUNTRY} = $country;
...
Now, all of your data is in %data. You can pass it around from place to place in your program, and not worry whether you've updated each and every single hash.
Once you get the hang of references, you are on your way to writing Object Oriented Perl code.
Get a good book on Modern Perl too. Perl coding techniques have changed quite a bit since Perl 5 was released. Unfortunately, most people never learn the way Perl should be written because they learn from old books that are lying around, or from looking at older code written in the Perl 3 and Perl 4 error (pun intended). Perl is a flexible and powerful language that allows you to quickly generate yourself enough rope to hang yourself. Learning good programming techniques will allow you to write more complex and comprehensive programs that are actually easier to read and maintain.
Almost complete program...
Here's the complete program that finds the most recent file in a particular directory, then reads in that file and parses the lines.
I'm using -M file test. This file test returns the last modification time of the file as expressed as the age of the file in days since the program ran. For example, a file that was last modified 2 1/2 days ago will return 2.5 while a file last modified one day and four hours ago will return 1.16666667. You can use this to compare the age of the various files.
This program does works for Perl 5.8.8 without installing any new modules, and I've tested it with data I've made up.
You can see I use "open ... or die ...; without any issues. Are you getting some other error? Do you have use strict; and use warnings; set in your program?
#! /usr/bin/env perl
#
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use Text::ParseWords;
use Benchmark;
use constant {
DATA_FILE_DIR => "temp",
};
#
# Find newest file in the directory
#
opendir my $data_dir, DATA_FILE_DIR
or die qq(Cannot open directory for reading.);
my $newest_file;
while ( my $file = readdir $data_dir ) {
next if $file eq "." or $file eq "..";
my $full_name = DATA_FILE_DIR . "/" . $file;
if ( not defined $newest_file
or -M $full_name < -M $newest_file ) {
$newest_file = $full_name;
}
}
print qq(Using file is "$newest_file"\n);
closedir $data_dir;
open my $file, "<", $newest_file
or die qq(Cannot open file "$newest_file" for reading.);
while ( my $line = <$file> ) {
# Read in the entire line
my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)
= parse_line ',', 0, $line;
# Split the DATE/TIME field
my ($date, $time) = split /\s+/, $date_time;
# Split the Country/Name field
my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
# Print statement merely shows that these four fields are truly split.
print "$date, $time, $country, $name\n";
}

Related

Matching fields in a log file and transforming results

First a quick intro. I'm new here, so if I screw up a post, please let me know and I'll fix it.
I've been trying to accomplish my goal using perl, but I'm stuck. I don't need to use perl to accomplish it, but I figure it's that, or Excel and I like perl better. If you have a better method please share.
I start with a file (output from a log file). It is 1 line, fields delimitted by colon. Here is an example of the file:
RmDenySumm:SGID=46244:Req=15000:tsid=46244:AllocBw=38332:BwList=12456/12500/3750/5876/3750:tsid=63042:AllocBw=38750:BwList=15000/12500/3750/3750/3750:tsid=63043:AllocBw=36717:BwList=14706/12500/3750/5761:tsid=63044:AllocBw=37011:BwList=15000/12500/5761/3750:tsid=61741:AllocBw=38450:BwList=12339/3750/6501/12502/3357:tsid=61721:AllocBw=37460:BwList=12500/15000/4200/5760:tsid=2072:AllocBw=31975:BwList=12136/12339/3750/3750:tsid=2073:AllocBw=24260:BwList=14634/5876/3750:tsid=30842:AllocBw=38453:BwList=14634/12500/5761/5557:tsid=30843:AllocBw=37105:BwList=15000/15000/3750/3355:tsid=30844:AllocBw=38295:BwList=14706/12339/3750/3750/3750:tsid=30845:AllocBw=25601:BwList=5762/12339/3750/3750:tsid=30846:AllocBw=38455:BwList=15000/12136/5761/5557:tsid=30847:AllocBw=26974:BwList=14634/12339:tsid=30848:AllocBw=29634:BwList=14634/15000:tsid=30849:AllocBw=37338:BwList=14838/15000/3750/3750:tsid=60958:AllocBw=36898:BwList=12339/12500/6501/5557:tsid=60959:AllocBw=37178:BwList=12339/12500/12339:tsid=60960:AllocBw=27339:BwList=12339/15000:tsid=60962:AllocBw=34839:BwList=12339/3750/15000/3750:tsid=60963:AllocBw=37500:BwList=15000/15000/3750/3750:tsid=60964:AllocBw=38346:BwList=15000/3754/15000/4592:tsid=60965:AllocBw=24626:BwList=15000/5876/3750:tsid=60966:AllocBw=34513:BwList=12502/12500/5761/3750
I need to grab all of "AllocBW=######" fields, separate the number part from the "AllocBW", add them all together then subtract them from a set value.
In perl, I have this:
#!/usr/bin/perl -w
use Data::Dumper;
#
#
my $file = "/home/nick/perl/svcgroup.txt";
my #asplit;
my $c = 0;
open (FILE, "<", $file) or die "Can't open file".$!."\n";
while (<FILE>) {
$_ =~ s/\n//g;
push(#asplit, split (":", $_));
#print Dumper #asplit;
}
foreach $splits (#asplit) {
if ($splits =~ m/AllocBw/) {
print $splits."\n";
}
}
#print Dumper #asplit;
print "\n\n";
close FILE;
exit;
Which leaves me with:
AllocBw=38332
AllocBw=38750
AllocBw=36717
AllocBw=37011
AllocBw=38450
AllocBw=37460
AllocBw=31975
AllocBw=24260
AllocBw=38453
AllocBw=37105
AllocBw=38295
AllocBw=25601
AllocBw=38455
AllocBw=26974
AllocBw=29634
AllocBw=37338
AllocBw=36898
AllocBw=37178
AllocBw=27339
AllocBw=34839
AllocBw=37500
AllocBw=38346
AllocBw=24626
AllocBw=34513
This is where I get stuck. I'm not sure how to strip these values down to the number and add them up.
If someone can assist, I'd be grateful. If this is more easily accomplished using something other than Perl, that's fine too. My programming scope is limited, as I only make small scripts to accomplish small repetitive tasks at work.
EDIT FOR BORODIN
ie (not formatted like this, this is just for illustration):
AllocBw 12575+
AllocBw 12568+
AllocBw 12358 = TotAllocBw 37501
MaxBw 38800*3=116400
116400(MaxBw) - 37501(TotAllocBw) = TotAvaiBw 78899
This would just be a big bonus. The script you wrote works perfectly well for my purposes and I can adapt it as I need. Thanks again! Much appreciated. I was able to follow everything you did differently in the script and learned some new stuff.. Thanks for that as well.
It is simplest to use a global regular expression match to find all occurrences of AllocBw=... in each line of your input file.
This program's outer while loop iterates over all the lines in the input file, and so should be executed only once.
The inner while iterates over all instances of the regex pattern AllocBw=(\d+) (AllocBw= followed by any number of decimal digits) and captures the numeric value into $1.
The captured number is added to $total each time, and can simply be printed at the end.
use strict;
use warnings;
my $file = '/home/nick/perl/svcgroup.txt';
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
my $total = 0;
while ( <$fh> ) {
$total += $1 while /AllocBw=(\d+)/g;
}
printf "Total: %d\n", $total;
output
Total: 826049

Create multiple output files and cut dna with enzymes - Perl

I am a first year grad student who's relatively new in computational biology. I recently started using Perl and it's not the easiest language to learn, at least not for me.
I need help applying my idea/logic the right way to figure out the solution to my problem.
I have a dna string and I want to split it at specific sites to get multiple fragments using information from an enzyme file that contains lines of recognition sites. Once the fragments are obtained, I want to output the list of dna fragments in an output file. I want to create an output file for every line in the enzyme file I am going to extract the information from, to apply it to the dna string.
Here's what I mean exactly:
Hypothetical scenario:
Enzyme.File contains:
abc/at'gtct// (abc is the name of the enzyme. (atgtct) is the recognition site.)
def/cgg'ataaa// ........
Suppose the dna string is: $dna = "accggttatgtctaaacggataaagtctcggataaattt" (recognition sites are bolded)
For line 1
When I extract the info from the first line/enzyme(abc) from the enzyme file and apply it to this string, the output should be:
accggttat
gtctaaacggataaagtctcggataaattt
(split between cgg'ataaa) the apostrophe represents the cut point
(note: Even though there is another gtct in the string, it does not split it because at ought to precede it.)
For line 2
$dna = accggttatgtctaaacggataaagtctcggataaattt (Info is applied to same dna string)
Info from line/enzyme 2 (def) would split the dna as follow:
accggttatgtctaaacgg (split between cgg'ataaa)
ataaagtctcgg
ataaattt
I want to put each output from the different lines in separate file with distinct names. (I can take care of assigning the names)
So in conclusion, this example would create two new files, one name "abc_whatever" and "def_whatever". Important: If the enzyme file had 8 lines with different enzymes, I would get 8 new output files with their distinct dna fragments."
Here's what I've tried so far:
#!/usr/bin/perl;
use warnings;
use strict;
open(ENZ,$ARGV[0]) || die; # ENZ(file handle for enzyme file)
my $dna = "accggttatgtctaaacggataaagtctcggataaattt";
while (<ENZ>) {
if ( match pattern etc..) { # I took care of that and created captured groups of
$1 = holds "abc" # the info I needed from the line e.g. I captured
$2 = ..."at" # (abc)/(at)'(gtct)//, so they are stored in $1,$2,$3
$3 = ..."gtct" # respectively
}
while (<$dna>){
my #fragments_array = split(/$3/, $dna);
open (OutFile, ">$dna"."_"."$1")
print OutFile shift #fragments_array,"\n";
foreach (#fragments_array) {
print OutFile "$3$_\n";
close OutFile;
}
}
}
close ENZ;
FIRST
I can only create an output only for the 1st line in the Enzyme file. I want to create and output file for all the lines.
SECOND
I am not properly cutting the dna. From other examples I have seen online, it looks like I am gonna have to use the following functions to properly apply the enzyme information on the dna. The functions include:
the for loop, length and substr(),
If you can, please demonstrate your work in the simplest form (no extravagant, impressing codes lol :-) since I am just learning this language)
Thanks in advance!
FIRST I can only create an output only for the 1st line in the Enzyme file. I want to create and output file for all the lines.
That's simply because you put close OutFile; into the foreach (#fragments_array) loop, instead of placing the close after the loop body.
SECOND I am not properly cutting the dna.
That's because you forgot to include $2, the head of the recognition site (e. g. the at of atgtct) in the split pattern as well as in the output.
The problem is solved easier if we just insert the splitting new-line character everywhere between the head and the tail:
#!/usr/bin/perl
use warnings;
use strict;
open(ENZ, $ARGV[0]) || die; # ENZ (file handle for enzyme file)
my $dna = "accggttatgtctaaacggataaagtctcggataaattt";
while (<ENZ>)
{
if (m-(.*)/(.*)'(.*)//-)
{
my ($head, $tail) = ($2, $3); # $2$3 is the recognition site; save it
open(OutFile, ">${dna}_$1");
(my $fragments = $dna) =~ s/$head$tail/$head\n$tail/g; # insert NLs
print OutFile $fragments, "\n";
close OutFile;
}
}
close ENZ;
I changed your code a bit, hope it works now
#!/usr/bin/perl
use warnings;
use strict;
open(ENZ, $ARGV[0]);
my $dna = "accggttatgtctaaacggataaagtctcggataaattt";
my ($enzyme, $first, $second) = ("", "", "");
for my $line (<ENZ>) {
chomp($line); # remove \n at the end of string
my #elements = split(/\/|'/, $line); # split string into tokens (e.g. abc/at'gtct => array(abc, at, gtct))
$elements[2] = substr($elements[2], 0, -2); # remove the last "//"
my ($firstPart, $secondPart) = ($elements[1], $elements[2]);
if ($dna =~ /(.*)$firstPart$secondPart(.*)/) {
$first = $1 . $firstPart;
$second = $2 . $secondPart;
$enzyme = $elements[0];
open(OUTPUT, ">$enzyme" . "_something");
print OUTPUT "$first\n$second\n";
close(OUTPUT);
}
}
close ENZ;
EDIT: this is the working version. I suggest you learn how to use Regular Expression if you want to use Perl for your study. It is the strongest tool in Perl.

Perl - How to Read, Filter & Output results

scenario: I am a Jr. C# developer, but recently (3 days) began learning Perl for batch files. I have a requirement to parse through a text file, extract some key data, then output the key data to a new text file. As seems to always be the case, there are butt loads of fragmented examples on the net regarding how to 'read' from a file, 'write' to a file, 'store' line by line into an array, 'filter' this and that, yadda yadda, but nothing discussing the entire process of read, filter, write. Trying to splice examples from the net together is no good, because none seem to work together as coherent code. Coming from C#, Perl's syntax structure is hella confusing. I just need some advice on this process.
My objective is to parse a text file, single out all lines similar to the one below, by date, and output only the first 8 digits of the 2nd number group and 5 digits from the 3rd number group to a new text file.
11122 20100223454345 ....random text..... [keyword that identifies all the
entries I need]... random text 0.0034543345
I know regex is likely the best option, and have most of the expression written, but it does not work in Perl!
Question: Could someone please show a simple (dummy) example of how to read from, filter (using dummy regex) the file, then output the (dummy) results to a new file? I'm not concerned with functional details, I can learn those, I just need the syntax structure Perl uses. For example:
open(FH, '<', 'dummy1.txt')
open(NFH, '>', 'dummy2.txt')
#array; or $dumb;
while(<FH>)
{
filter each line [REGEX] and shove it into [#array or $dumb scalar]
}
print(join(',', #array)) to dummy2.txt
close FH;
close NFH;
Note: For various reasons, I cannot paste my source code in here, sorry. Any help is appreciated.
UPDATE: ANSWER:
Much thanks to all those who provided insight into my issue. After reading through you replies, as well as conducting further research, I learned that there are dozens of ways to accomplish the same task in Perl(which I am not a fan of). In the end, this is how I solved the problem, and IMO it's the cleanest, and most succinct, solution for those having similar struggles. Thanks again for all the help.
#======================================================================
# 1. READ FILE: inputFile.txt
# 2. CREATE FILE: outputFile.txt
# 3. WRITE TO: outputFile.txt IF line matches REGEX constraints
# 4. CLOSE FILES: outputFile.txt & inputFile.txt
#==========================================================================
#1
$readFile = 'C:/.../.../inputFile.txt';
open(FH, '<', $readFile) or Error("Could not read file ($!)");
#2
$writeFile = 'C:/.../.../outputFile.txt';
open(NFH, '>', $writeFile) or Error("Cannot write to file ($!)");
#3
#lines = <FH>;
LINE: foreach $line (#lines)
{
if ($line =~ m/(201403\d\d).*KEYWORD.*time was (\d+\.\d+)/)
{
$date = $1;
$elapsedtime = $2;
print NFH "$date,$elapsedtime\n";
}
}
#4
close NFH;
close FH;
perlfaq5 - How do I change, delete, or insert a line in a file, or append to the beginning of a file? covers most of the different scenarios for how to use files.
However, I will add to that by saying that always start your scripts with use strict; and use warnings;, and because you're doing file processing, use autodie; will serve you as well.
With that in mind, a quick stub would be the following:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'dummy1.txt';
open my $outfh, '>', 'dummy2.txt';
while (my $line = <$infh>) {
chomp $line; # Remove \n
if (Whatever magically processing here) {
print $outfh, "your new data";
}
}
while(<FH>)
{
# variable $_ contains the current line
if(m/regex_goes_here/) #by default, the regex match operator m// attempts to match the default $_ variable
{
#do actions
}
}
Also note, m/regex/ is the same as /regex/
Refer to:
http://perldoc.perl.org/perlvar.html#General-Variables
http://perldoc.perl.org/perlre.html
For capturing variables from regex match, THIS might help
EDIT
If you want a different variable than the default $_, as #Miller suggested, use while($line = <FH>) followed by if($line =~ m/regex_goes_here/)
=~ is the Binding Operator
One tip. Don't explicitly open filehandles to your input and output files. Instead read from STDIN and write to STDOUT. Your program will be far more flexible and easier to use as you'll be able to treat it like a Unix filter.
$ your_filter_program < your_input.txt > your_output.txt
And doing this actually makes your program simpler to write too.
while (<>) { # <> reads from STDIN
# transform your data (which is in $_) in some way
...
print; # prints $_ to STDOUT
}
You might find the first few chapters of Data Munging with Perl are useful.
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
INPUT_FILE => "NAME_OF_INPUT_FILE",
OUTPUT_FILE => "NAME_OF_OUTPUT_FILE",
FILTER => qr/regex_for_line_to_filter/,
};
open my $in_fh, "<", INPUT_FILE;
open my $out_fh, ">", OUTPUT_FILE;
while ( my $line = <$in_fh> ) {
chomp $line;
next unless $line =~ FILTER;
$line =~ s/regular_expression/replacement/;
say {$out_fh} $line;
}
close $in_file;
close $out_file;
The $in_file is your input file, and $out_fh is your output file. I basically open both, and loop through the input. The chomp removes the \n from the end. I always recommend doing that.
The next goes to the next iteration of the loop unless I match FILTER which is a regular expression matching lines you want to keep. This is identical to:
if ( $line !~ FILTER ) {
next;
}
I then use the substitution command to get the parts of the line I want, and munge them into the output I want. I maybe better off expanding this a bit. Maybe using split to split up my line into various pieces, the only using the pieces I want. I could then use substr to pull out the substring from the select pieces.
The say command is like print except it automatically adds in a NL on the end. This is how you write a line to a file.
Now, get Learning Perl and read it. If you know any programming. it shouldn't take you more than a week to go through the first half of the book. That should be more than enough to be able to write a program like this. The more complex stuff like references and object orientation might take a bit longer.
On line documentation can be found at http://perldoc.perl.org. You can look up the use statements which are called pragmas over there. Documentation on the individual functions are also available.
If I understood well, this one liner will do the job:
perl -ane 'print substr($F[1],0,8),"\t",substr($F[-1],0,5),"\n" if /keyword/' in.txt
Assuming in.txt is:
11122 20100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.0034543345
11122 30100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.124543345
11122 40100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.65487
11122 50100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.6215
output:
20100223 0.003
40100223 0.654

Problems with separator-encoding

If I run this script as it is, it works.
But why does this not work with cgi?
When I use _\01_ instead of _\00_ it works with cgi too.
#!/usr/bin/env perl
use warnings;
use 5.012;
### script_1.cgi #########################################
my #array = ( '1524', '2.18 MB', '09/23/03', '_cool_name_', 'type' );
my $row = join "_\00_", #array;
say $row;
# submit $row to script_2.cgi
### script_2.cgi #########################################
# ...
# my $row = $cgi->param('row');
# my $name;
if ( $row =~ /_\00_([^\00]+)_\00_type\z/ ) {
# $name = $1;
say "Name: <$1>";
} else {
die "<$row> $!";
}
# Software error:
# <1524_�_2.18 MB_�_09/23/03_�__cool_name__�_type> at script_2.cgi line of "die "<$row> $!";"
Works for me, says _cool_name_. You're probably running afoul of CGI.pm using \0 already for itself, but since you did not post your complete code, no one can say for sure.
I'll use the opportunity to unask the question. The lessons you should learn are:
Avoid rolling your own serialisation scheme. As a beginner, you have made the typical mistake of not encoding the separator if it occurs in the data (c.f. double backslash in string expressions and double percent in sprintf expressions). The array could have been passed intact unjoined via e.g. JSON.
Instead of two scripts, these should be two subroutines in the same program. This way, you are able to pass data structures without the need to serialise.

How to search for lines in a file between two timestamps using Perl?

In Perl I am trying to read a log file and will print only the lines that have a timestamp between two specific times. The time format is hh:mm:ss and this is always the third value on each log. For example, I would be searching for lines that would fall between 12:52:33 to 12:59:33
I am new to Perl and have no idea which route to take to even begin to program this. I am pretty sure this would use some type of regex, but for the life of me I cannot even begin to fathom what that would be. Could someone please assist me with this.
Also, to make this more difficult I have to do this with the core Perl modules because my company will not allow me to use any other modules until they have been tested and verified there will be no ill effects on any of the systems the script may interact with.
In pseudocode, you'd do something like this:
read in the file line by line:
parse the timestamp for this line.
if it's less than the start time, skip to the next line.
if it's greater than the end time, skip to the next line!
else: this is a line you want: print it out.
This may be too advanced for your needs, but the flip-flop operator .. immediately comes to mind as something that would be useful here.
For reading in a file from stdin, this is the conventional pattern:
while (my $line = <>)
{
# do stuff...
}
Parsing a line into fields can be done easily with split (see perldoc -f split). You will probably need to split the line by tabs or spaces, depending on the format.
Once you've got the particular field (containing the timestamp), you can examine it using a customized regexp. Read about those at perldoc perlre.
Here's something which might get you closer:
use strict;
use warnings;
use POSIX 'mktime';
my $starttime = mktime(33, 52, 12);
my $endtime = mktime(33, 59, 12);
while (my $line = <>)
{
# split into fields using whitespace as the delimiter
my #fields = split(/\s+/, $line);
# the timestamp is the 3rd field
my $timestamp = $fields[2];
my ($hour, $min, $sec) = split(':', $timestamp);
my $time = mktime($sec, $min, $hour);
next unless ($time < $starttime) .. ($time > $endtime);
print $line;
}
If the start and end times are known, a Perl one-liner with a flip-flop operator is what you need:
perl -ne 'print if /12:52:33/../12:59:33/' logFile
If there is some underlying logic needed in order for you to determine the start and end times, then 'unroll' the one-liner to a formal script:
use strict;
use warnings;
open my $log, '<', 'logFile';
my $startTime = get_start_time(); # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time(); # Sets $endTime in hh:mm:ss format
while ( <$log> ) {
print if /$startTime/../$endTime/;
}
As noted by Ether's comment, this will fail if the exact time is not present. If this is a possibility, one might implement the following logic instead:
use strict;
use warnings;
use autosplit;
open my $log, '<', 'logFile';
my $startTime = get_start_time(); # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time(); # Sets $endTime in hh:mm:ss format
while ( <$log> ) {
my $time = (split /,/, $_)[2]; # Assuming fields are comma-separated
# and timelog is 3rd field
last if $time gt $endTime; # Stop when stop time reached
print if $time ge $startTime;
}
If each line in the file has the time stamp, then in 'sed' you could write:
sed -n '/12:52:33/,/12:59:33/p' logfile
This will echo the relevant lines.
There is a Perl program, s2p, that will convert 'sed' scripts to Perl.
The basic Perl structure is along the lines of:
my $atfirst = 0;
my $atend = 0;
while (<>)
{
last if $atend;
$atfirst = 1 if m/12:52:33/;
$atend = 1 if m/12:59:33/;
if ($atfirst)
{
process line as required
}
}
Note that as written, the code will process the first line that matches the end marker. If you don't want that, move the 'last' after the test.
If your log files are segregated by day, you could convert the timestamps to seconds and compare those. (If not, use the technique from my answer to a question you asked earlier.)
Say your log is
12:52:32 outside
12:52:43 strictly inside
12:59:33 end
12:59:34 outside
Then with
#! /usr/bin/perl
use warnings;
use strict;
my $LOGPATH = "/tmp/foo.log";
sub usage { "Usage: $0 start-time end-time\n" }
sub to_seconds {
my($h,$m,$s) = split /:/, $_[0];
$h * 60 * 60 +
$m * 60 +
$s;
}
die usage unless #ARGV == 2;
my($start,$end) = map to_seconds($_), #ARGV;
open my $log, "<", $LOGPATH or die "$0: open $LOGPATH: $!";
while (<$log>) {
if (/^(\d+:\d+:\d+)\s+/) {
my $time = to_seconds $1;
print if $time >= $start && $time <= $end;
}
else {
warn "$0: $LOGPATH:$.: no timestamp!\n";
}
}
you'd get the following output:
$ ./between 12:52:33 12:59:33
12:52:43 strictly inside
12:59:33 end