Related
First a quick intro. I'm new here, so if I screw up a post, please let me know and I'll fix it.
I've been trying to accomplish my goal using perl, but I'm stuck. I don't need to use perl to accomplish it, but I figure it's that, or Excel and I like perl better. If you have a better method please share.
I start with a file (output from a log file). It is 1 line, fields delimitted by colon. Here is an example of the file:
RmDenySumm:SGID=46244:Req=15000:tsid=46244:AllocBw=38332:BwList=12456/12500/3750/5876/3750:tsid=63042:AllocBw=38750:BwList=15000/12500/3750/3750/3750:tsid=63043:AllocBw=36717:BwList=14706/12500/3750/5761:tsid=63044:AllocBw=37011:BwList=15000/12500/5761/3750:tsid=61741:AllocBw=38450:BwList=12339/3750/6501/12502/3357:tsid=61721:AllocBw=37460:BwList=12500/15000/4200/5760:tsid=2072:AllocBw=31975:BwList=12136/12339/3750/3750:tsid=2073:AllocBw=24260:BwList=14634/5876/3750:tsid=30842:AllocBw=38453:BwList=14634/12500/5761/5557:tsid=30843:AllocBw=37105:BwList=15000/15000/3750/3355:tsid=30844:AllocBw=38295:BwList=14706/12339/3750/3750/3750:tsid=30845:AllocBw=25601:BwList=5762/12339/3750/3750:tsid=30846:AllocBw=38455:BwList=15000/12136/5761/5557:tsid=30847:AllocBw=26974:BwList=14634/12339:tsid=30848:AllocBw=29634:BwList=14634/15000:tsid=30849:AllocBw=37338:BwList=14838/15000/3750/3750:tsid=60958:AllocBw=36898:BwList=12339/12500/6501/5557:tsid=60959:AllocBw=37178:BwList=12339/12500/12339:tsid=60960:AllocBw=27339:BwList=12339/15000:tsid=60962:AllocBw=34839:BwList=12339/3750/15000/3750:tsid=60963:AllocBw=37500:BwList=15000/15000/3750/3750:tsid=60964:AllocBw=38346:BwList=15000/3754/15000/4592:tsid=60965:AllocBw=24626:BwList=15000/5876/3750:tsid=60966:AllocBw=34513:BwList=12502/12500/5761/3750
I need to grab all of "AllocBW=######" fields, separate the number part from the "AllocBW", add them all together then subtract them from a set value.
In perl, I have this:
#!/usr/bin/perl -w
use Data::Dumper;
#
#
my $file = "/home/nick/perl/svcgroup.txt";
my #asplit;
my $c = 0;
open (FILE, "<", $file) or die "Can't open file".$!."\n";
while (<FILE>) {
$_ =~ s/\n//g;
push(#asplit, split (":", $_));
#print Dumper #asplit;
}
foreach $splits (#asplit) {
if ($splits =~ m/AllocBw/) {
print $splits."\n";
}
}
#print Dumper #asplit;
print "\n\n";
close FILE;
exit;
Which leaves me with:
AllocBw=38332
AllocBw=38750
AllocBw=36717
AllocBw=37011
AllocBw=38450
AllocBw=37460
AllocBw=31975
AllocBw=24260
AllocBw=38453
AllocBw=37105
AllocBw=38295
AllocBw=25601
AllocBw=38455
AllocBw=26974
AllocBw=29634
AllocBw=37338
AllocBw=36898
AllocBw=37178
AllocBw=27339
AllocBw=34839
AllocBw=37500
AllocBw=38346
AllocBw=24626
AllocBw=34513
This is where I get stuck. I'm not sure how to strip these values down to the number and add them up.
If someone can assist, I'd be grateful. If this is more easily accomplished using something other than Perl, that's fine too. My programming scope is limited, as I only make small scripts to accomplish small repetitive tasks at work.
EDIT FOR BORODIN
ie (not formatted like this, this is just for illustration):
AllocBw 12575+
AllocBw 12568+
AllocBw 12358 = TotAllocBw 37501
MaxBw 38800*3=116400
116400(MaxBw) - 37501(TotAllocBw) = TotAvaiBw 78899
This would just be a big bonus. The script you wrote works perfectly well for my purposes and I can adapt it as I need. Thanks again! Much appreciated. I was able to follow everything you did differently in the script and learned some new stuff.. Thanks for that as well.
It is simplest to use a global regular expression match to find all occurrences of AllocBw=... in each line of your input file.
This program's outer while loop iterates over all the lines in the input file, and so should be executed only once.
The inner while iterates over all instances of the regex pattern AllocBw=(\d+) (AllocBw= followed by any number of decimal digits) and captures the numeric value into $1.
The captured number is added to $total each time, and can simply be printed at the end.
use strict;
use warnings;
my $file = '/home/nick/perl/svcgroup.txt';
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
my $total = 0;
while ( <$fh> ) {
$total += $1 while /AllocBw=(\d+)/g;
}
printf "Total: %d\n", $total;
output
Total: 826049
I need to create a perl script that reads the last modified file in a given folder (the file is always a .csv) and parses the values from their columns, so I can control them to a mysql database.
The main problem is: I need to separate the Date from the Hours, and the Country from the Names(CHN, DEU and JPN represent China, Deutschland and Japan).
They come together like in the example below:
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"
So far I can split the lines, but how can I make it understand that each value into "" and separated by , should be inserted into my arrays?
my %date;
my %hour;
my %country;
my %name;
my %percentage_one;
my %percentage_two;
# Selects lastest file in the given directory
my $files = File::DirList::list('/home/cvna/IN/SCRIPTS/zabbix/roaming/tratamento_IAS/GPRS_IN', 'M');
my $file = $files->[0]->[13];
open(CONFIG_FILE,$file);
while (<CONFIG_FILE>){
# Splits the file into various lines
#lines = split(/\n/,$_);
# For each line that i get...
foreach my $line (#lines){
# I need to split the values between , without the ""
# And separating Hour from Date, and Name from Country
#aux = split(/......./,$line)
}
}
close(CONFIG_FILE);
readline or <> only reads one line. There's no need to split it on newlines. But, instead of fixing your code, use Text::CSV:
#!/usr/bin/perl
use 5.010;
use warnings;
use strict;
use Text::CSV;
my $csv = 'Text::CSV'->new({ binary => 1 }) or die 'Text::CSV'->error_diag;
while (my $row = $csv->getline(*DATA)) {
my ($date, $time) = split / /, $row->[0];
my ($country, $name) = split / - /, $row->[3];
print "Date: $date\tTime: $time\tCountry: $country\tName: $name\n";
}
__DATA__
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"
Looking at your code, it appears you're pretty new to Perl. The Text::CSV module is a nice solution, but unfortunately, isn't a standard module. You'll need to use CPAN to install it. It isn't difficult, but may require you to be the administrator of your computer.
The module Text::ParseWords is a standard module and can handle quoted words much like Text::CSV can.
You'll need to basically split the line (which I do with the parse_linefunction). The first parameter is , which is what I want to split my line upon. Unlike split itself, parse_line doesn't split on the parameters that are quoted, and handles backticked quotes. This is very similar to Text::CSV.
Once you've split your line, you'll need to split date from time and country from name. In my example, I show two ways of doing this: One uses split and the other uses a matching regular expression. Either one will work.
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use feature qw(say); # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;
while ( my $line = <DATA> ) {
my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)
= parse_line ',', 0, $line;
my ($date, $time) = split /\s+/, $date_time;
my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
say "$date, $time, $country, $name";
}
__DATA__
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"
In your actual program, you'll open your file, and make sure you've opened that file. You can test for that, or use autodie:
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use feature qw(say); # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;
use autodie;
open my $config_file, "<", $file; # No need for testing thanks to use autodie!
# What you need to do if you don't use autodie
# open my $config_file, "<", $file or die qq(Can't open "$file" for reading);
while ( my $line = <$config_file> ) {
my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)
= parse_line ',', 0, $line;
my ($date, $time) = split /\s+/, $date_time;
my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
say "$date, $time, $country, $name"; # Show fields were correctly parsed.
}
It looks like you want to store the data, I see you have multiple hashes that I bet you're trying to keep in parallel. Take a look at how you can use references that allows you to build more complex structures:
my %data; #Where I'll be storing the data...
$data{$key}->{DATE} = $date;
$data{$key}->{HOUR} = $hour;
$data{$key}->{COUNTRY} = $country;
...
Now, all of your data is in %data. You can pass it around from place to place in your program, and not worry whether you've updated each and every single hash.
Once you get the hang of references, you are on your way to writing Object Oriented Perl code.
Get a good book on Modern Perl too. Perl coding techniques have changed quite a bit since Perl 5 was released. Unfortunately, most people never learn the way Perl should be written because they learn from old books that are lying around, or from looking at older code written in the Perl 3 and Perl 4 error (pun intended). Perl is a flexible and powerful language that allows you to quickly generate yourself enough rope to hang yourself. Learning good programming techniques will allow you to write more complex and comprehensive programs that are actually easier to read and maintain.
Almost complete program...
Here's the complete program that finds the most recent file in a particular directory, then reads in that file and parses the lines.
I'm using -M file test. This file test returns the last modification time of the file as expressed as the age of the file in days since the program ran. For example, a file that was last modified 2 1/2 days ago will return 2.5 while a file last modified one day and four hours ago will return 1.16666667. You can use this to compare the age of the various files.
This program does works for Perl 5.8.8 without installing any new modules, and I've tested it with data I've made up.
You can see I use "open ... or die ...; without any issues. Are you getting some other error? Do you have use strict; and use warnings; set in your program?
#! /usr/bin/env perl
#
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use Text::ParseWords;
use Benchmark;
use constant {
DATA_FILE_DIR => "temp",
};
#
# Find newest file in the directory
#
opendir my $data_dir, DATA_FILE_DIR
or die qq(Cannot open directory for reading.);
my $newest_file;
while ( my $file = readdir $data_dir ) {
next if $file eq "." or $file eq "..";
my $full_name = DATA_FILE_DIR . "/" . $file;
if ( not defined $newest_file
or -M $full_name < -M $newest_file ) {
$newest_file = $full_name;
}
}
print qq(Using file is "$newest_file"\n);
closedir $data_dir;
open my $file, "<", $newest_file
or die qq(Cannot open file "$newest_file" for reading.);
while ( my $line = <$file> ) {
# Read in the entire line
my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)
= parse_line ',', 0, $line;
# Split the DATE/TIME field
my ($date, $time) = split /\s+/, $date_time;
# Split the Country/Name field
my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
# Print statement merely shows that these four fields are truly split.
print "$date, $time, $country, $name\n";
}
my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
my $name = "";
#name = ( $line =~ m/Name:([\w\s\_\,/g );
foreach (#name) {
print $name."\n";
}
I want to capture the word between Name: and ,Region whereever it occurs in the whole line. The main loophole is that the name can be of any format
Amanda_Marry_Rose
Amanda.Marry.Rose
Amanda Marry Rose
Amanda/Marry/Rose
I need a help in capturing such a pattern every time it occurs in the line. So for the line I provided, the output should be
Amanda_Marry_Rose
Raghav.S.Thomas
Does anyone has any idea how to do this? I tried keeping the below line, but it's giving me the wrong output as.
#name=($line=~m/Name:([\w\s\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\#\[\\\]\^\_\`\{\|\}\~\´]+)\,/g);
Output
Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE
To capture between Name: and the first comma, use a negated character class:
/Name:([^,]+)/g
This says to match one or more characters following Name: which isn't a comma:
while (/Name:([^,]+)/g) {
print $1, "\n";
}
This is more efficient than a non-greedy quantifier, e.g:
/Name:(.+?),/g
As it doesn't require backtracking.
Reg-ex corrected:
my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
my #name = ($line =~ /Name\:([\w\s_.\/]+)\,/g);
foreach my $name (#name) {
print $name."\n";
}
What you have there is comma separated data. How you should parse this depends a lot on your data. If it is full-fledged csv data, the most safe approach is to use a proper csv parser, such as Text::CSV. If it is less strict data, you can get away with using the light-weight parser Text::ParseWords, which also has the benefit of being a core module in Perl 5. If what you have here is rather basic, user entered fields, then I would recommend split -- simply because when you know the delimiter, it is easier and safer to define it, than everything else inside it.
use strict;
use warnings;
use Data::Dumper;
my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
# Simple split
my #fields = split /,/, $line;
print Dumper for map /^Name:(.*)/, #fields;
use Text::ParseWords;
print Dumper map /^Name:(.*)/, quotewords(',', 0, $line);
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
});
$csv->parse($line);
print Dumper map /^Name:(.*)/, $csv->fields;
Each of these options give the same output, save for the one that uses Text::CSV, which also issues an undefined warning, quite correctly, because your data has a trailing comma (meaning an empty field at the end).
Each of these has different strengths and weaknesses. Text::CSV can choke on data that does not conform with the CSV format, and split cannot handle embedded commas, such as Name:"Doe, John",....
The regex we use to extract the names very simply just captures the entire rest of the lines that begin with Name:. This also allows you to perform sanity checks on the field names, for example issue a warning if you suddenly find a field called Doe;Name:
The simple way is to look for all sequences of non-comma characters after every instance of Name: in the string.
use strict;
use warnings;
my $line = 'Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,';
my #names = $line =~ /Name:([^,]+)/g;
print "$_\n" for #names;
output
Amanda_Marry_Rose
Raghav.S.Thomas
However, it may well be useful to parse the data into an array of hashes so that related fields are gathered together.
use strict;
use warnings;
my $line = 'Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,';
my %info;
my #persons;
while ( $line =~ / ([a-z]+) : ([^:,]+) /gix ) {
my ($key, $val) = (lc $1, $2);
if ($info{$key}) {
push #persons, { %info };
%info = ();
}
$info{$key} = $val;
}
push #persons, { %info };
use Data::Dump;
dd \#persons;
print "\nNames:\n";
print "$_\n" for map $_->{name}, #persons;
output
[
{
cardtype => "DebitCard",
host => "USE",
name => "Amanda_Marry_Rose",
product => "Satin",
region => "US",
},
{
name => "Raghav.S.Thomas",
region => "UAE",
},
]
Names:
Amanda_Marry_Rose
Raghav.S.Thomas
I'm creating a subroutine in my Perl script and can evaluate it nicely and it works. I would also like to print the content of the subroutine for debugging purposes. However, the subroutine, which is constructed in code, is really huge and is hard to read and understand it by simply printing it. I would like to find a way to be able to print it in a semi-indented way.
Here is piece of code generation:
$code .= "if (\$ct=~/^\\s*\$/x || \$Im < \$Ix) {push(\#min, $b); push(\#max, $b);} if (\$Im > \$Ix) {push(\#min, $a); push(\#max, $a);}"
And I would like to print it something like this:
if (\$ct=~/^\\s*\$/x || \$Im < \$Ix)
{push(\#min, $b); push(\#max, $b);}
if (\$Im > \$Ix)
{push(\#min, $a); push(\#max, $a);}
I know that the straight way to do this is to write another script to parse it and put some \n and \t into the appropriate places in code and then print it. Is there a smarter way to that?
Like putting \n somewhere in code without subverting evaling it (i.e., something visible to print but invisible to eval).
NOTE: I have a lot of regexes in my subroutine and I want to avoid running them every time. That's why I need to have the code stored in a string and then eval it to increase my script performance.
Ignoring the reasons why you may have code in a string...
Perl::Tidy is the tool that you need to reformat your code.
Normally, one uses this tool via the command line on source files. However, I've hacked together a little script that will output your code string to a temporary file so that it can be reformatted. Note, this currently assumes that your code is well-formed and that there aren't any obvious syntax errors in it as formatting broken code is outside the purview of this tool.
use strict;
use warnings;
use autodie;
my $code = <<'END_CODE';
# It hurts to write ugly code, but I'll see what I can do
sub { my #vars = #_;
my $count = scalar(#vars); print "Hello World. Vars = $count"; return; }
END_CODE
print pretty_code($code);
sub pretty_code {
my $code = shift;
require File::Temp;
require Perl::Tidy;
my ($fh, $filename) = File::Temp::tempfile();
print $fh $code;
close $fh;
Perl::Tidy::perltidy(
source => $filename,
);
my $output = do {
open my $fh, '<', "$filename.tdy";
local $/;
<$fh>
};
unlink $_ for ($filename, "$filename.tdy");
return $output;
}
Outputs:
# It hurts to write ugly code, but I'll see what I can do
sub {
my #vars = #_;
my $count = scalar(#vars);
print "Hello World. Vars = $count";
return;
}
Update
There is no need to use a temporary file, particularly as Perl::Tidy accumulates the tidied code in memory before dumping it to disk. If you prefer, this program does the same thing without writing the result to disk.
use strict;
use warnings;
use Perl::Tidy 'perltidy';
my $code = <<'END_CODE';
# It hurts to write ugly code, but I'll see what I can do
sub { my #vars = #_; my $count = scalar(#vars); print "Hello World. Vars = $count"; return; }
END_CODE
print pretty_code($code);
sub pretty_code {
my ($code) = #_;
my $pretty;
perltidy(
source => \$code,
destination => \$pretty,
);
$pretty;
}
output
# It hurts to write ugly code, but I'll see what I can do
sub {
my #vars = #_;
my $count = scalar(#vars);
print "Hello World. Vars = $count";
return;
}
I'm not clear at present why the closing brace is indented further, but I am certain that the result is better than the original.
scenario: I am a Jr. C# developer, but recently (3 days) began learning Perl for batch files. I have a requirement to parse through a text file, extract some key data, then output the key data to a new text file. As seems to always be the case, there are butt loads of fragmented examples on the net regarding how to 'read' from a file, 'write' to a file, 'store' line by line into an array, 'filter' this and that, yadda yadda, but nothing discussing the entire process of read, filter, write. Trying to splice examples from the net together is no good, because none seem to work together as coherent code. Coming from C#, Perl's syntax structure is hella confusing. I just need some advice on this process.
My objective is to parse a text file, single out all lines similar to the one below, by date, and output only the first 8 digits of the 2nd number group and 5 digits from the 3rd number group to a new text file.
11122 20100223454345 ....random text..... [keyword that identifies all the
entries I need]... random text 0.0034543345
I know regex is likely the best option, and have most of the expression written, but it does not work in Perl!
Question: Could someone please show a simple (dummy) example of how to read from, filter (using dummy regex) the file, then output the (dummy) results to a new file? I'm not concerned with functional details, I can learn those, I just need the syntax structure Perl uses. For example:
open(FH, '<', 'dummy1.txt')
open(NFH, '>', 'dummy2.txt')
#array; or $dumb;
while(<FH>)
{
filter each line [REGEX] and shove it into [#array or $dumb scalar]
}
print(join(',', #array)) to dummy2.txt
close FH;
close NFH;
Note: For various reasons, I cannot paste my source code in here, sorry. Any help is appreciated.
UPDATE: ANSWER:
Much thanks to all those who provided insight into my issue. After reading through you replies, as well as conducting further research, I learned that there are dozens of ways to accomplish the same task in Perl(which I am not a fan of). In the end, this is how I solved the problem, and IMO it's the cleanest, and most succinct, solution for those having similar struggles. Thanks again for all the help.
#======================================================================
# 1. READ FILE: inputFile.txt
# 2. CREATE FILE: outputFile.txt
# 3. WRITE TO: outputFile.txt IF line matches REGEX constraints
# 4. CLOSE FILES: outputFile.txt & inputFile.txt
#==========================================================================
#1
$readFile = 'C:/.../.../inputFile.txt';
open(FH, '<', $readFile) or Error("Could not read file ($!)");
#2
$writeFile = 'C:/.../.../outputFile.txt';
open(NFH, '>', $writeFile) or Error("Cannot write to file ($!)");
#3
#lines = <FH>;
LINE: foreach $line (#lines)
{
if ($line =~ m/(201403\d\d).*KEYWORD.*time was (\d+\.\d+)/)
{
$date = $1;
$elapsedtime = $2;
print NFH "$date,$elapsedtime\n";
}
}
#4
close NFH;
close FH;
perlfaq5 - How do I change, delete, or insert a line in a file, or append to the beginning of a file? covers most of the different scenarios for how to use files.
However, I will add to that by saying that always start your scripts with use strict; and use warnings;, and because you're doing file processing, use autodie; will serve you as well.
With that in mind, a quick stub would be the following:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'dummy1.txt';
open my $outfh, '>', 'dummy2.txt';
while (my $line = <$infh>) {
chomp $line; # Remove \n
if (Whatever magically processing here) {
print $outfh, "your new data";
}
}
while(<FH>)
{
# variable $_ contains the current line
if(m/regex_goes_here/) #by default, the regex match operator m// attempts to match the default $_ variable
{
#do actions
}
}
Also note, m/regex/ is the same as /regex/
Refer to:
http://perldoc.perl.org/perlvar.html#General-Variables
http://perldoc.perl.org/perlre.html
For capturing variables from regex match, THIS might help
EDIT
If you want a different variable than the default $_, as #Miller suggested, use while($line = <FH>) followed by if($line =~ m/regex_goes_here/)
=~ is the Binding Operator
One tip. Don't explicitly open filehandles to your input and output files. Instead read from STDIN and write to STDOUT. Your program will be far more flexible and easier to use as you'll be able to treat it like a Unix filter.
$ your_filter_program < your_input.txt > your_output.txt
And doing this actually makes your program simpler to write too.
while (<>) { # <> reads from STDIN
# transform your data (which is in $_) in some way
...
print; # prints $_ to STDOUT
}
You might find the first few chapters of Data Munging with Perl are useful.
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
INPUT_FILE => "NAME_OF_INPUT_FILE",
OUTPUT_FILE => "NAME_OF_OUTPUT_FILE",
FILTER => qr/regex_for_line_to_filter/,
};
open my $in_fh, "<", INPUT_FILE;
open my $out_fh, ">", OUTPUT_FILE;
while ( my $line = <$in_fh> ) {
chomp $line;
next unless $line =~ FILTER;
$line =~ s/regular_expression/replacement/;
say {$out_fh} $line;
}
close $in_file;
close $out_file;
The $in_file is your input file, and $out_fh is your output file. I basically open both, and loop through the input. The chomp removes the \n from the end. I always recommend doing that.
The next goes to the next iteration of the loop unless I match FILTER which is a regular expression matching lines you want to keep. This is identical to:
if ( $line !~ FILTER ) {
next;
}
I then use the substitution command to get the parts of the line I want, and munge them into the output I want. I maybe better off expanding this a bit. Maybe using split to split up my line into various pieces, the only using the pieces I want. I could then use substr to pull out the substring from the select pieces.
The say command is like print except it automatically adds in a NL on the end. This is how you write a line to a file.
Now, get Learning Perl and read it. If you know any programming. it shouldn't take you more than a week to go through the first half of the book. That should be more than enough to be able to write a program like this. The more complex stuff like references and object orientation might take a bit longer.
On line documentation can be found at http://perldoc.perl.org. You can look up the use statements which are called pragmas over there. Documentation on the individual functions are also available.
If I understood well, this one liner will do the job:
perl -ane 'print substr($F[1],0,8),"\t",substr($F[-1],0,5),"\n" if /keyword/' in.txt
Assuming in.txt is:
11122 20100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.0034543345
11122 30100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.124543345
11122 40100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.65487
11122 50100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.6215
output:
20100223 0.003
40100223 0.654