I have a Perl script which runs a perforce command and stores the result in a variable $command.
Then it is stored in a file log.txt, and by using a regex the relevant data is taken out.
When I run that command alone the following things pop out:
4680 p4exp/v68 PJIANG-015394 25:34:19 IDLE none
8869 unnamed p4-python R integration semiconductor-project-trunktip turbolinuxclient 01:33:52 IDLE none
8870 unnamed p4-python R integration remote-trunktip-osxclient 01:33:52
The code goes as follows:
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
my $command = qx |p4 monitor show -ale|;
open FH, '>>', "log.txt";
print FH $command;
close FH;
open my $log_fh, '<', '/root/log.txt';
my %stat;
while ($line = <$log_fh>) {
chomp $line;
next if not $line =~ /(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/) {
my $killid_details = $line;
$stat{$killid} = $killid_details;
}
}
close $log_fh;
my $killpro;
foreach my $kill (keys %stat) {
print "$kill\n";
}
The following gets the number 8869 but how to do it without log.txt. Is using an array a better way to do it or hash is fine?
Please correct me as I am still learning.
Seems like your main stumbling block is getting line-by-line input for your loop?
Splitting on newlines should do the trick:
my $killid;
my #lines = split("\n", $command); #split on newlines
for my $line (#lines) {
next if not $line =~ /(\d+)\s+/;
my $id = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/){
$killid = $id;
}
}
One caveat: you mentioned an output of 8870, but I'm getting 8869. The regexps you gave are looking for a line with "integration" and "IDLE none", and for your example input that appears to match 8869.
A hash is fine, though if you're using only one key in it (which seems to be the case), you might as well just use a single variable.
If you assign the result of a qx construct to an array instead of a scalar, then it will be split into lines automatically for you. This code demonstrates.
use strict;
use warnings;
my #lines = qx|p4 monitor show -ale|;
my %stat;
for my $line (#lines) {
chomp $line;
next unless $line =~ /(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/) {
$stat{$killid} = $line;
}
}
print "$_\n" for keys %stat;
Related
new perl coder here.
When I copy and paste the text from a website into a text file and read from that file, my perl script works with no issues. When I use getstore to create a file from the website automatically which is what I want, the output is a bunch of |'s.
The text looks identical when I copy and paste, or download the text with getstore.. I'm unable to figure out the problem. Any help would be highly appreciated.
The output that I desire is as follows:
|www\.arkinsoftware\.in|www\.askmeaboutrotary\.com|www\.assculturaleincontri\.it|www\.asu\.msmu\.ru|www\.atousoft\.com|www\.aucoeurdelanature\.
enter code here
Here is the code I am using:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
getstore("http://www.malwaredomainlist.com/hostslist/hosts.txt", "malhosts.txt");
open(my $input, "<", "malhosts.txt");
while (my $line = <$input>) {
chomp $line;
$line =~ s/.*\s+//;
$line =~ s/\./\\\./g;
print "$line\|";
}
The bunch of | you get, is from the unfitting comment-lines at the beginning. So the solution is to ignore all "unfitting" lines.
So instead of
$line =~ s/.*\s+//;
use
next unless $line =~ s/^127.*\s+//;
so you would ignore every line except thos starting with 127.
Here's what I'd do:
my $first = 1;
while (<$input>) {
/^127\.0\.0\.1\s+(.+?)\s*$/ or next;
print '|' if !$first;
$first = 0;
print quotemeta($1);
}
This matches your input in a more precise way, and quotemeta takes care of true regex escaping.
I'd probably go with something like:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
getstore( "http://www.malwaredomainlist.com/hostslist/hosts.txt",
"malhosts.txt" );
open( my $input, "<", "malhosts.txt" );
print join ( "|",
map { m/^\d/ && ! m/localhost/ ?
quotemeta ((split)[1]) : () } <$input> );
Gives:
0koryu0\.easter\.ne\.jp|1\-atraffickim\.tf|10\-trafficimj\.tf|109\-204\-26\-16\.netconnexion\.managedbroadband\.co\.uk|11\-atraasikim\.tf|11\.lamarianella\.info|12\-tgaffickvcmb\.tf| #etc.
I am using Perl and need to get all domain names from http://www.malwaredomainlist.com/hostslist/hosts.txt into a flat file.
I think the easiest way to do this is to use a regular expression but I can't get my head around how to build the expression.
my code so far:
#!/usr/bin/perl
use LWP::Simple;
$url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
$content = get $url;
open(my $fh, '>', '/home/jay/feed.txt');
#logic here
}
close $fh;
I'm not sure if I should loop over each line and perform an expression on that or if I should take the whole file as a string and work with that.
The page is just a text/plain document, so I think I would just copy and paste the page into my editor and remove the unwanted information. However if you would prefer a Perl program then this is all that is necessary. It uses LWP::Simple::get to fetch the text page and a regex to search it for lines starting with digits and dots, returning the second field of each
use strict;
use warnings;
use feature 'say';
use LWP::Simple qw/ get /;
my $url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
say for get($url) =~ /^[\d.]+\s+(\S+)/gam;
or as a one-liner
perl -MLWP::Simple=get -E"say for get(shift) =~ /^[\d.]+\s+(\S+)/gam" http://www.malwaredomainlist.com/hostslist/hosts.txt
Unless you have a particular need, iterating by line is the way forward. Otherwise you just tie up memory unnecessarily.
However when you're fetching a url, it's a bit academic - I would suggest that fetching it to a file first isn't a bad thing though, so you can re-process it without needing to refetch.
Given source data sample:
for ( split ( "\n", $content ) ) {
next unless m/^\d/; #skip lines that don't start with a digit.
my ( $IP, $hostname ) = split;
my $domainname = $hostname =~ s/^\w+\.//r;
print $domainname,"\n";
}
This doesn't entirely work with your list though, because in that list you have a mix of hostnames and domain names, and it's not actually all that easy to tell the difference.
After all, the 'tld' at the end might be .com or it might be .org.it
127.0.0.1\s+(.*)
should work fine with global modifier.
Demo
Unless saving the list file locally is a requirement (in which case you might be better off just using wget or curl), there is no need to save it in an external file to process it line-by-line.
You can instead open a filehandle to the string itself.
In the script below, extract_hosts would work the same whether you give it a reference to a string or a filename:
#!/usr/bin/env perl
use strict;
use warnings;
use Carp qw( croak );
use LWP::Simple qw( get );
my $url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
my $malware_hosts = get $url;
unless (defined $malware_hosts) {
die "Failed to get content from '$url'\n";
}
my $hosts = extract_hosts(\$malware_hosts);
print "$_\n" for #$hosts;
sub extract_hosts {
my $src = shift;
open my $fh, '<', $src
or croak "Failed to open '$src' for reading: $!";
my #hosts;
while (my $entry = <$fh>) {
next unless $entry =~ /\S/;
next if $entry =~ /^#/;
my (undef, $host) = split ' ', $entry;
push #hosts, $host;
}
close $fh
or croak "Failed to close '$src': $!";
\#hosts;
}
This will give you the list of hosts.
Code to grep the hostnames from the given file.
use LWP::Simple;
my $url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
my $content = get $url;
my #server_names = split(/127\.0\.0\.1\s*/, $content);
open(my $fh, '>', '/home/jay/feed.txt');
print $fh "#server_names";
close $fh;
Here is another implementation. It uses HTML::Tiny which is part of the core so you don't have to install anything.
use HTTP::Tiny;
my $response = HTTP::Tiny->new->get('http://www.malwaredomainlist.com/hostslist/hosts.txt');
die "Failed!\n" unless $response->{success};
my #content;
for my $line ( split ( "\n", $response->{content} ) ){
next if ( $line =~ /^#|^$/);
push #content, ((split ( " ", $line ))[1]);
}
print Dumper (\#content);
I am trying to check each line of a document for a regex match.
If the line has a match, I want to push the match only into an array.
In the code below, I thought that using the g operator at the end of the regex delimiters would make $lines value the regex match only. Instead $lines value is the entire line of the document containing the match...
my $line;
my #table;
while($line = <$input>){
if($line =~ m/foo/g){
push (#table, $line);
}
}
print #table;
If any one could help me get my matches into an array, it is much appreciated.
Thanks.
p.s.
Still learning... so any explanations of concepts I may have missed is also much appreciated.
g modifier in s///g is for global search and replace.
If you just want to push matching pattern into an array, you need to capture matching pattern enclosed by (). Captured elements are stored in variable $1, $2, etc..
Try following modification to your code:
my #table;
while(my $line = <$input>){
if($line =~ m/(foo)/){
push (#table, $1);
}
}
print #table;
Refer to this documentation for more details.
Or if you want to avoid needless use of global variables,
my #table;
while(my $line = <$input>){
if(my #captures = $line =~ m/(foo)/){
push #table, #captures;
}
}
which simplifies to
my #table;
while(my $line = <$input>){
push #table, $line =~ m/(foo)/;
}
Expanding on jkshah's answer a little, I'm explicitly storing the matches in #matches instead of using the magic variable $1 which I find a little harder to read.
"__DATA__" is a simple way to store lines in a filehandle in a perl source file.
use strict;
use warnings;
my #table;
while(my $line = <DATA>){
my #matches = $line =~ m/(foo)/;
if(#matches) {
warn "found: " . join(',', #matches );
push(#table,#matches);
}
}
print #table;
__DATA__
herp de derp foo
yerp fool foo flerp
heyhey
If you file is not very big(100-500mb fine for 2 GB RAM) then you can use below.Here I am extracting numbers if matched in line.It will be much faster than the foreach loop.
#!/usr/bin/perl
open my $file_h,"<abc" or die "ERROR-$!";
my #file = <$file_h>;
my $file_cont = join(' ',#file);
#file =();
my #match = $file_cont =~ /\d+/g;
print "#match";
I have a text file which consists of different lines it looks like
Destination|203.190.242.69|reached|203.190.244.6
Destination|208.109.249.198|reached|212.142.1.1
Destination|94.75.253.170|reached|85.17.100.90
Destination|212.112.234.228|reached|4.69.143.210
Destination|80.146.246.42|reached|192.168.1.1
Destination|122.209.193.217|reached|59.128.3.65
Destination|66.77.197.179|reached|66.77.197.251
Destination|195.254.227.65|reached|213.21.128.141
Destination|125.208.8.253|reached|125.208.15.254
I need to save both the IPs and save them in different arrays. So there will
be two arrays, one Destination and one reached. How can I do this. At the moment I have written a code to detect the IP but that does not seem to work.
while (my $line = <$in>) {
my $traceroute;
if ($line =~ /(^Destination)/) {
print "DUDE\n";
my $ip = $line =~ /(\d+\.\d+\.\d+\.\d+)$/s;
#$traceroute = $2;
print "$ip\n";
}
}
This should do the trick:
while (my $line = <DATA>) {
if($line =~ /(^Destination)/){
my($dest, $reach) = $line =~ /(\d+\.\d+\.\d+\.\d+)/g;
print "$dest $reach\n";
}
}
__DATA__
Destination|203.190.242.69|reached|203.190.244.6
Destination|208.109.249.198|reached|212.142.1.1
Destination|94.75.253.170|reached|85.17.100.90
Destination|212.112.234.228|reached|4.69.143.210
Destination|80.146.246.42|reached|192.168.1.1
Destination|122.209.193.217|reached|59.128.3.65
Destination|66.77.197.179|reached|66.77.197.251
Destination|195.254.227.65|reached|213.21.128.141
Destination|125.208.8.253|reached|125.208.15.254
Performing a regex in scalar context, just returns the number of successful matches or 0 if it fails. Assigning the result of a RegEx to an array or a list of variables puts it in list context, in which the RegEx returns the captured values.
The /g modifier matches the RegEx not only once, but as often, as it fits in the string. Read perldoc perlretut for more
Since you want to save the addresses in arrays, and none of the previous solutions explicitly does that, I'm posting another answer:
use strict;
my (#destination, #reached);
foreach my $line (<DATA>) {
chomp $line;
my #fields = split '\|', $line;
push #destination, $fields[1];
push #reached, $fields[3];
}
use Data::Dumper;
print "Destinations:\n".Dumper(#destination);
print "Reached:\n".Dumper(#reached);
__DATA__
Destination|203.190.242.69|reached|203.190.244.6
Destination|208.109.249.198|reached|212.142.1.1
Destination|94.75.253.170|reached|85.17.100.90
Destination|212.112.234.228|reached|4.69.143.210
Destination|80.146.246.42|reached|192.168.1.1
Destination|122.209.193.217|reached|59.128.3.65
Destination|66.77.197.179|reached|66.77.197.251
Destination|195.254.227.65|reached|213.21.128.141
Destination|125.208.8.253|reached|125.208.15.254
Split on the separator and save the items at indices 1 and 3.
perl -aF'\|' -lne 'next unless $F[0] eq "Destination"; print "$F[1]|$F[3]"' input >output
You may want a different output format and/or do something more, of course.
while (my $line = <$in>) {
chomp $line;
if ($line =~ /^Destination\|(\d+\.\d+\.\d+\.\d+)\|reached\|(\d+\.\d+\.\d+\.\d+)$/) {
my ($d, $r) = ($1,$2);
print "$d => $r\n";
}
}
Well to me Perl sometimes looks abit Abracadabra
so many thanks for the patience with me...
update; there were some errors untill user1269651 and Bodoin offered agreat fix
see the results of bodoins code..(note he has changed the code one time - i used here the first version ever...:;
linux-wyee:/home/martin/perl # perl test_7.pl
http://www.unifr.ch/sfm
http://www.zug.phz.ch
http://www.schwyz.phz.ch
http://www.luzern.phz.ch
http://www.schwyz.phz.ch http://www.phvs.ch http://www.phtg.ch http://www.phsg.ch http://www.phsh.ch Use of uninitialized value $png in print at test_7.pl line 25, <$urls> line 10. http://www.phr.ch http://www.hepfr.ch/
http://www.phbern.ch
http://www.ph-solothurn.ch
http://www.pfh-gr.ch
Got status code 500 at test_7.pl line 14
linux-wyee:/home/martin/perl #
and the latest version of bodins code some results are looking like that..
Can't call method "addProgressListener" on an undefined value at /usr/lib/perl5/site_perl/5.14.2/WWW/Mechanize/Firefox.pm line 566, <$urls> line 12.
well some minor things left - see above... what can we do with those little errors..
btw: what about the idea of storing the results in a folder... /(called images or so!?)
end of update...
here the inital thread starts - and gives an outline of what is wanted:
i need to have some thumbnails from websites but i tried to use wget - but that does not work for me, since i need some rendering functions what is needet: i have a list of 2,500 URLs, one on each line, saved in a file. Then i want a script - see it below - to open the file, read a line, then retrieve the website and save the image as a small thumbnail.
well since i have a bunch of web-sites (2500) i have to make up my mind about the naming of the results.
http://www.unifr.ch/sfm
http://www.zug.phz.ch
http://www.schwyz.phz.ch
http://www.luzern.phz.ch
http://www.schwyz.phz.ch
http://www.phvs.ch
http://www.phtg.ch
http://www.phsg.ch
http://www.phsh.ch
http://www.phr.ch
http://www.hepfr.ch/
http://www.phbern.ch
So far so good, well i think i try something like this
We also have to close a filehandler if we do not need it anymore. Besides this we can use 'or die' on open. i did it - see below!
Btw we need a good file name. Since i have a huge list of urls then i get a huge list of output files. Therefore i need to have good file names. Can we reflect those things and needs in the programme!?
the script does not start at all ....
#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = new WWW::Mechanize::Firefox();
open(INPUT, "<urls.txt") or die $!;
while (<INPUT>) {
chomp;
next if $_ =~ m/http/i;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s#http://##is;
$name =~s#/##gis;$name =~s#\s+\z##is;$name =~s#\A\s+##is;
$name =~s/^www\.//;
$name .= ".png";
open(my $out, ">",$name) or die $!;
binmode($out);
print $out $png;
close($out);
sleep (5);
}
I came up with this:
while (my $name = <DATA>) {
chomp ($name) ;
#$mech->get($_);
#my $png = $mech->content_as_png();
$name =~ s#http://##; #REMOVE THIS LINE
$name =~s#/#-#gis;
$name =~s#\s+\z##is;$name =~s#\A\s+##is;
$name =~s/^www\.//;
$name .= ".png";
print $name . "\n\n"; #REMOVE THIS LINE
#open(my $out, ">",$name) or die $!;
#binmode($out);
#print $out $png;
#close($out);
#sleep (5);
}
__DATA__
http://www.unifr.ch/sfm
http://www.zug.phz.ch
http://www.schwyz.phz.ch
http://www.luzern.phz.ch
http://www.schwyz.phz.ch
http://www.phvs.ch
http://www.phtg.ch
http://www.phsg.ch
http://www.phsh.ch
http://www.phr.ch
http://www.hepfr.ch/
http://www.phbern.ch
You should be able to modify it for your needs, I commented out all but the regex stuff. I also changed one regec to replace a '/' with a '-' so that there is less probability of falsly generating duplicate URL's.
So that http://www.unifr.ch/sfm will look like this: unifr.ch-sfm
Hope this helps
There are a number of problems with your code. Most significant is the line
next if $_ =~ m/http/i;
which discards all lines from urls.txt that contain http, which isn't what you want.
Rather than go through each problem indicvidually I am offering a functional version. I hope this is satisfactory.
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = new WWW::Mechanize::Firefox();
open my $urls, '<', 'urls.txt' or die $!;
while (<$urls>) {
chomp;
next unless /^http/i;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png;
my $name = $_;
$name =~ s#^http://##i;
$name =~ s#/##g;
$name =~ s/\s+\z//;
$name =~ s/\A\s+//;
$name =~ s/^www\.//;
$name .= ".png";
open my $out, ">", $name or die $!;
binmode $out;
print $out $png;
close $out;
sleep 5;
}