Extract specific values from a log file - regex

I want to extract two values on the same line of a log file using Perl.
Network Next Hop metric locprf Path
*|i10.1.5.0/24 10.6.76.242 2 100 0 65000?
*|i10.1.9.0/24 10.6.76.242 2 100 0 64345 63800?
*|i10.2.9.0/25 10.6.76.242 2 100 0?
For each line, I want to extract the network address and the number before the ?
I have this but it extracts only the network address.
open( CONF, '<', 'putty-wan.log' ) or die "\n";
my #ip;
open( FICHE, ">RouterNetwork.txt" ) || die ( "Vous ne pouvez pas créer le fichier \"RouterNetwork.txt\"" );
while ( my $line = <CONF> ) {
if ( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\/\d{1,2})/ ) {
print FICHE $1, "\n";
}
}
close(FICHE);
close CONF;
Now I want the regular expression to add or any way to get per line, the network address and the number just before ?.

Given the shown format, you can process the line with
my ($ip, $n) = map { s/^\D*|\D*$//gr } (split ' ', $line)[0,-1];
or, when the line is in the $_ variable
my ($ip, $n) = map { s/^\D*|\D*$//gr } (split)[0,-1];
With the /r non-desctructive modifier the new string is returned (leaving the original unchanged, what we don't care about here). It's available since v5.16. If your version of Perl is older use
my ($ip, $n) = map { s/^\D*|\D*$//g; $_ } (split)[0,-1];
As for processing the whole file, you need a way to detect header lines. How to do this depends on details of your file format. Given the sample, perhaps skip lines starting with letter-only words
use warnings;
use strict;
use feature 'say';
my $file = 'putty-wan.log';
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
next if /^[a-zA-Z]+\b/;
my ($ip, $num) = map { s/^\D*|\D*$//gr } (split)[0,-1];
say "$ip $num";
}
Some comments
Please always start with use warnings;, and with use strict;
Use three-argument form of open, with a lexical filehandle. It's better
Always include $! in your die statements, to see the actual error. This would be the "default" way to do it while sometimes other error variables are needed as well.
While there is nothing wrong with using || as you do, the or is very handy for flow control, having a suitably lower precedence. But above all, it's good to be consistent in any case.
It's been clarified that the last part on the line can also be 6500 ? or 65000 i or such.
Then store all fields in an array and process it from the back, looking for the first field with numbers.
while (<$fh>)
{
next if /^[a-zA-Z]+\b/;
my #fields = split;
my $ip = (shift #fields) =~ s/^\D*//gr; #/# need v5.16 for /r
my $num;
while (my $f = pop #fields) {
($num) = $f =~ /(\d+)/;
last if $num;
}
say "$ip $num";
}
The IP is still obtained from the first field, and cleaned up the same way as before.

There are nothing particular to do, only to continue the line description with the number you want to capture:
use strict;
use warnings;
open (my $conf, '<', 'putty-wan.log') || die "Don't eat too much Montbéliard saussages\n";
open (my $output, '>', 'RouterNetwork.txt') || die ('Vous ne pouvez pas créer le fichier "RouterNetwork.txt"');
while( <$conf> ) { # the current line is stored in $_
print $output "$1\t$2\n" if /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\/\d{1,2}).*\b(\d+)\?/;
}
close $output;
close $conf;
Note the word boundary before the number to be sure to obtain the whole number and not the last digit only.
The pattern can also be shorten to: /([\d.]{7,15}\/\d\d?).*?(\d+)\?/
Take care to not use old school programming style and look at perl current practises. (use strict and warnings systematically)
Note that with log files, a fields approach (split the line by whitespaces) is sometimes more handy.

Related

perl regex: searching thru entire line of file

I'm a regex newbie, and I am trying to use a regex to return a list of dates from a text file. The dates are in mm/dd/yy format, so for years it would be '55' for '1955', for example. I am trying to return all entries from years'50' to '99'.
I believe the problem I am having is that once my regex finds a match on a line, it stops right there and jumps to the next line without checking the rest of the line. For example, I have the dates 12/12/12, 10/10/57, 10/09/66 all on one line in the text file, and it only returns 10/10/57.
Here is my code thus far. Any hints or tips? Thank you
open INPUT, "< dates.txt" or die "Can't open input file: $!";
while (my $line = <INPUT>){
if ($line =~ /(\d\d)\/(\d\d)\/([5-9][0-9])/g){
print "$&\n" ;
}
}
A few points about your code
You must always use strict and use warnings 'all' at the top of all your Perl programs
You should prefer lexical file handles and the three-parameter form of open
If your regex pattern contains literal slashes then it is clearest to use a non-standard delimiter so that they don't need to be escaped
Although recent releases of Perl have fixed the issue, there used to be a significant performance hit when using $&, so it is best to avoid it, at least for now. Put capturing parentheses around the whole pattern and use $1 instead
This program will do as you ask
use strict;
use warnings 'all';
open my $fh, '<', 'dates.txt' or die "Can't open input file: $!";
while ( <$fh> ) {
print $1, "\n" while m{(\d\d/\d\d/[5-9][0-9])}g
}
output
10/10/57
10/09/66
You are printing $& which gets updated whenever any new match is encountered.
But in this case you need to store the all the previous matches and the updated one too, so you can use array for storing all the matches.
while(<$fh>) {
#dates = $_ =~ /(\d\d)\/(\d\d)\/([5-9][0-9])/g;
print "#dates\n" if(#dates);
}
You just need to change the 'if' to a 'while' and the regex will take up where it left off;
open INPUT, "< a.dat" or die "Can't open input file: $!";
while (my $line = <INPUT>){
while ($line =~ /(\d\d)\/(\d\d)\/([5-9][0-9])/g){
print "$&\n" ;
}
}
# Output given line above
# 10/10/57
# 10/09/66
You could also capture the whole of the date into one capture variable and use a different regex delimiter to save escaping the slashes:
while ($line =~ m|(\d\d/\d\d/[5-9]\d)|g) {
print "$1\n" ;
}
...but that's a matter of taste, perhaps.
You can use map also to get year range 50 to 99 and store in array
open INPUT, "< dates.txt" or die "Can't open input file: $!";
#as = map{$_ =~ m/\d\d\/\d\d\/[5-9][0-9]/g} <INPUT>;
$, = "\n";
print #as;
Another way around it is removing the dates you don't want.
$line =~ s/\d\d\/\d\d\/[0-4]\d//g;
print $line;

Using Perl to match all words after a particular word

I am using Perl and need to get all domain names from http://www.malwaredomainlist.com/hostslist/hosts.txt into a flat file.
I think the easiest way to do this is to use a regular expression but I can't get my head around how to build the expression.
my code so far:
#!/usr/bin/perl
use LWP::Simple;
$url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
$content = get $url;
open(my $fh, '>', '/home/jay/feed.txt');
#logic here
}
close $fh;
I'm not sure if I should loop over each line and perform an expression on that or if I should take the whole file as a string and work with that.
The page is just a text/plain document, so I think I would just copy and paste the page into my editor and remove the unwanted information. However if you would prefer a Perl program then this is all that is necessary. It uses LWP::Simple::get to fetch the text page and a regex to search it for lines starting with digits and dots, returning the second field of each
use strict;
use warnings;
use feature 'say';
use LWP::Simple qw/ get /;
my $url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
say for get($url) =~ /^[\d.]+\s+(\S+)/gam;
or as a one-liner
perl -MLWP::Simple=get -E"say for get(shift) =~ /^[\d.]+\s+(\S+)/gam" http://www.malwaredomainlist.com/hostslist/hosts.txt
Unless you have a particular need, iterating by line is the way forward. Otherwise you just tie up memory unnecessarily.
However when you're fetching a url, it's a bit academic - I would suggest that fetching it to a file first isn't a bad thing though, so you can re-process it without needing to refetch.
Given source data sample:
for ( split ( "\n", $content ) ) {
next unless m/^\d/; #skip lines that don't start with a digit.
my ( $IP, $hostname ) = split;
my $domainname = $hostname =~ s/^\w+\.//r;
print $domainname,"\n";
}
This doesn't entirely work with your list though, because in that list you have a mix of hostnames and domain names, and it's not actually all that easy to tell the difference.
After all, the 'tld' at the end might be .com or it might be .org.it
127.0.0.1\s+(.*)
should work fine with global modifier.
Demo
Unless saving the list file locally is a requirement (in which case you might be better off just using wget or curl), there is no need to save it in an external file to process it line-by-line.
You can instead open a filehandle to the string itself.
In the script below, extract_hosts would work the same whether you give it a reference to a string or a filename:
#!/usr/bin/env perl
use strict;
use warnings;
use Carp qw( croak );
use LWP::Simple qw( get );
my $url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
my $malware_hosts = get $url;
unless (defined $malware_hosts) {
die "Failed to get content from '$url'\n";
}
my $hosts = extract_hosts(\$malware_hosts);
print "$_\n" for #$hosts;
sub extract_hosts {
my $src = shift;
open my $fh, '<', $src
or croak "Failed to open '$src' for reading: $!";
my #hosts;
while (my $entry = <$fh>) {
next unless $entry =~ /\S/;
next if $entry =~ /^#/;
my (undef, $host) = split ' ', $entry;
push #hosts, $host;
}
close $fh
or croak "Failed to close '$src': $!";
\#hosts;
}
This will give you the list of hosts.
Code to grep the hostnames from the given file.
use LWP::Simple;
my $url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
my $content = get $url;
my #server_names = split(/127\.0\.0\.1\s*/, $content);
open(my $fh, '>', '/home/jay/feed.txt');
print $fh "#server_names";
close $fh;
Here is another implementation. It uses HTML::Tiny which is part of the core so you don't have to install anything.
use HTTP::Tiny;
my $response = HTTP::Tiny->new->get('http://www.malwaredomainlist.com/hostslist/hosts.txt');
die "Failed!\n" unless $response->{success};
my #content;
for my $line ( split ( "\n", $response->{content} ) ){
next if ( $line =~ /^#|^$/);
push #content, ((split ( " ", $line ))[1]);
}
print Dumper (\#content);

Perl, problems scrubbing a file with regex effeciently

First I will explain the problem my program attempts to solve. I have two input files, one contains lines of "good" numbers:
100000
100001
100002
100003
100004
The other file is a file of "raw" numbers that I want to check each line of and see if the line contains one of the "good" numbers above followed by 4 more numbers, the additional 4 numbers can be anything. so if the file containing the raw numbers is :
8881000001234
1000014321
999991000021234567
00234100001
1000041234
100002123
1000029876
after scrubbing with regex the matched numbers would be
1000001234
1000014321
1000021234
1000041234
1000029876
The way I have done this so far is to store the "good" numbers in an array then slurped the "raw" numbers into a scalar
my $FH
my#good_nums
open $FH, '<', 'good_numbers' or die $!;
while(<$FH>) { chomp; push #good_nums, $_; }
close $FH;
open $FH, '<', 'raw_numbers' or die $!;
my $raw_nums = do { local $/; <$FH> };
close $FH;
then with those I can do this :
my #matches;
foreach my $num (#good_nums) {
push #matches, $raw_nums =~ /$num\d{4}/g;
}
So #matches contains the correct matches and this has been working well.
But now I have developed the need to also capture the lines from "raw" numbers that did not match. I can capture the non matching lines by putting the "raw" numbers into an array (instead of slurping them) and join the #good_nums array into a regex :
my $QRnums = '(?:' . (join '|', #good_nums) . ')';
$QRnums = qr/$QRnums/;
my #raw_nums;
open my $FH, '<', 'raw_numbers' or die $!;
while(<$FH>) { chomp; push #raw_nums, $_; }
my #matches;
my #junk;
for (#raw_nums) {
if ($_ =~ /($QRnums\d{4})/g) {
push #matches, $1;
} else {
push #junk, $_;
}
}
This is working but when I increase the number of lines in each file to 150,000 or more, the latter solution takes 4 or 5 times longer than the former solution. I know there must be another Perl solution that can solve my problem efficiently but I am at a loss. I am not very good with intermediate Perl and beyond.. Is there a better way to do this or can my first solution be rewritten so that I can obtain the non matches in an array too? Apart from needing to solve the problem explained in the opening of my post, I am open to anything.
Only cache your regex one time, and then compare using $_ =~ $QRnums.
Additionally, there's no need to slurp your other file, just do line-by-line process instead.
my $QRnums = '(?:' . (join '|', #good_nums) . ')';
$QRnums = qr/($QRnums\d{4})/;
my #matches;
my #junk;
open my $FH, '<', 'raw_numbers' or die $!;
while (<$FH>) {
chomp;
if ($_ =~ $QRnums) {
push #matches, $1;
} else {
push #junk, $_;
}
}
Also, if your regex should be bounded to the start of the string ^, then I would suggest that you add that: $QRnums = qr/^($QRnums\d{4})/;
Addendum
From perlop - Regexp Quote-Like Operators
Since Perl may compile the pattern at the moment of execution of the qr() operator, using qr() may have speed advantages in some situations, notably if the result of qr() is used standalone:
and later:
Precompilation of the pattern into an internal representation at the moment of qr() avoids a need to recompile the pattern every time a match /$pat/ is attempted. (Perl has many other internal optimizations, but none would be triggered in the above example if we did not use qr() operator.)
Basically, because your #good_nums list was potentially very large, it made sense to cache that if we could so that it the regex test only needed to be compiled once.

Perl substitution using a hash

open (FH,"report");
read(FH,$text,-s "report");
$fill{"place"} = "Dhahran";
$fill{"wdesc:desc"} = "hot";
$fill{"dayno.days"} = 4;
$text =~ s/%(\w+)%/$fill{$1}/g;
print $text;
This is the content of the "report" template file
"I am giving a course this week in %place%. The weather is %wdesc:desc%
and we're now onto day no %dayno.days%. It's great group of blokes on the
course but the room is like the weather - %wdesc:desc% and it gets hard to
follow late in the day."
For reasons that I won't go into, some of the keys in the hash I'll be using will have dots (.) or colons (:) in them, but the regex stops working for these, so for instance in the example above only %place% gets correctly replaced. By the way, my code is based on this example.
Any help with the regex greatly appreciated, or maybe there's a better approach...
You could loosen it right up and use "any sequence of anything that isn't a %" for the replaceable tokens:
$text =~ s/%([^%]+)%/$fill{$1}/g;
Good answers so far, but you should also decide what you want to do with %foo% if foo isn't a key in the %fill hash. Plausible options are:
Replace it with an empty string (that's what the current solutions do, since undef is treated as an empty string in this context)
Leave it alone, so "%foo%" stays as it is.
Do some kind of error handling, perhaps printing a warning on STDERR, terminating the translation, or inserting an error indicator into the text.
Some other observations, not directly relevant to your question:
You should use the three-argument version of open.
That's not the cleanest way to read an entire file into a string. For that matter, for what you're doing you might as well process the input one line at a time.
Here's how I might do it (this version leaves unrecognized "%foo%" strings alone):
#!/usr/bin/perl
use strict;
use warnings;
my %fill = ( place => 'Dhahran',
'wdesc:desc' => 'hot',
'dayno.days' => 4 );
my $filename = 'report';
open my $FH,,'<', $filename or die "$filename: $!\n";
while (my $line = <$FH>) {
foreach my $key (keys %fill) {
$line =~ s/\Q%$key%/$fill{$key}/g;
}
print $line;
}
And here's a version that dies with an error message if there's an unrecognized key:
#!/usr/bin/perl
use strict;
use warnings;
my %fill = ( place => 'Dhahran',
'wdesc:desc' => 'hot',
'dayno.days' => 4 );
my $filename = 'report';
open my $FH,,'<', $filename or die "$filename: $!\n";
while (my $line = <$FH>) {
$line =~ s/%([^%]*)%/Replacement($1)/eg;
print $line;
}
sub Replacement {
my($key) = #_;
if (exists $fill{$key}) {
return $fill{$key};
}
else {
die "Unrecognized key \"$key\" on line $.\n";
}
}
http://codepad.org/G0WEDNyH
$text =~ s/%([a-zA-Z0-9_\.\:]+)%/$fill{$1}/g;
By default \w equates to [a-zA-Z0-9_], so you'll need to add in the \. and \:.

How to Circumvent Perl's string escaping the replacement string in s/// when it's read from a file?

This question is similar to my last one, with one difference to make the toy script more similar to my actual one.
Here is the toy script, replace.pl (Edit: now with 'use strict;', etc)
#! /usr/bin/perl -w
use strict;
open(REPL, "<", $ARGV[0]) or die "Couldn't open $ARGV[0]: $!!";
my %replacements;
while(<REPL>) {
chomp;
my ($orig, $new, #rest) = split /,/;
# Processing+sanitizing of orig/new here
$replacements{$orig} = $new;
}
close(REPL) or die "Couldn't close '$ARGV[0]': $!";
print "Performing the following replacements\n";
while(my ($k,$v) = each %replacements) {
print "\t$k => $v\n";
}
open(IN, "<", $ARGV[1]) or die "Couldn't open $ARGV[1]: $!!";
while ( <IN> ) {
while(my ($k,$v) = each %replacements) {
s/$k/$v/gee;
}
print;
}
close(IN) or die "Couldn't close '$ARGV[1]': $!";
So, now lets say I have two files, replacements.txt (using the best answer from the last question, plus a replacement pair that doesn't use substitution):
(f)oo,q($1."ar")
cat,hacker
and test.txt:
foo
cat
When I run perl replace.pl replacements.txt test.txt I would like the output to be
far
hacker
but instead it's '$1."ar"' (too much escaping) but the results are anything but (even with the other suggestions from that answer for the replacement string). The foo turns into ar, and the cat/hacker is eval'd to the empty string, it seems.
So, what changes do I need to make to replace.pl and/or replacements.txt? Other people will be creating the replacements.txt's, so I'd like to make that file as simple as possible (although I acknowledge that I'm opening the regex can of worms on them).
If this isn't possible to do in one step, I'll use macros to enumerate all possible replacement pairs for this particular file, and hope the issue doesn't come up again.
Please don't give us non-working toy scripts that don't use strict and warnings. Because one of the first things people will do in debugging is to turn those on, and you've just caused work.
Second tip, use the 3-argument version of open rather than the 2-argument version. It is safer. Also in your error checking do as perlstyle says (see http://perldoc.perl.org/perlstyle.html for the full advice) and include the file name and $!.
Anyways your problem is that the code you were including was q($1."ar"). When executed this returns the string $1."ar". Get rid of the q() and it works fine. BUT it causes warnings. That can be fixed by moving the quoting into the replace script, and out of the original script.
Here is a fixed script for you:
#! /usr/bin/perl -w
use strict;
open(REPL, "<", $ARGV[0]) or die "Couldn't open '$ARGV[0]': $!!";
my %replacements;
while(<REPL>) {
chomp;
my ($orig, $new) = split /,/;
# Processing+sanitizing of orig/new here
$replacements{$orig} = '"' . $new . '"';
}
close(REPL) or die "Couldn't close '$ARGV[0]': $!";
print "Performing the following replacements\n";
while(my ($k,$v) = each %replacements) {
print "\t$k => $v\n";
}
open(IN, "<", $ARGV[1]) or die "Couldn't open '$ARGV[1]': $!!";
while ( <IN> ) {
while(my($k,$v) = each %replacements) {
s/$k/$v/gee;
}
print;
}
close(IN) or die "Couldn't close '$ARGV[1]': $!";
And the modified replacements.txt is:
(f)oo,${1}ar
cat,hacker
You have introduced one more level of interpolation since the last question.
You can get the right result by either:
Lay a 3rd "e" modifier on your substitution
s/$k/$v/geee; # eeek
Remove a layer of interpolation in replacements.txt by making the first line
(f)oo,$1."ar"
Get rid of the q() in the replacement string;
Should be just
(f)oo,$1."ar"
as in ($k,$v) = split /,/, $_;
Warning: using external input data in evals is very, very dangerous
Or, just make it
(f)oo,"${1}ar"
No modification to the code is necessary either way e.g. s///gee.
Edit #drhorrible, if it doesen't work then you have other problems.
use strict;use warnings;
my $str = "foo";
my $repl = '(f)oo,q(${1}."ar")';
my ($k,$v) = split /,/, $repl;
$str =~ s/$k/$v/gee;
print $str,"\n";
$str = "foo";
$repl = '(f)oo,$1."ar"';
($k,$v) = split /,/, $repl;
$str =~ s/$k/$v/gee;
print $str,"\n";
$str = "foo";
$repl = '(f)oo,"${1}ar"';
($k,$v) = split /,/, $repl;
$str =~ s/$k/$v/gee;
print $str,"\n";
output:
${1}."ar"
far
far