WWW::Mechanize::Firefox - allmost there - only a little regex error left

WWW::Mechanize::Firefox - allmost there - only a little regex error left - regex

Well to me Perl sometimes looks abit Abracadabra
so many thanks for the patience with me...
update; there were some errors untill user1269651 and Bodoin offered agreat fix
see the results of bodoins code..(note he has changed the code one time - i used here the first version ever...:;
linux-wyee:/home/martin/perl # perl test_7.pl
http://www.unifr.ch/sfm
http://www.zug.phz.ch
http://www.schwyz.phz.ch
http://www.luzern.phz.ch
http://www.schwyz.phz.ch http://www.phvs.ch http://www.phtg.ch http://www.phsg.ch http://www.phsh.ch Use of uninitialized value $png in print at test_7.pl line 25, <$urls> line 10. http://www.phr.ch http://www.hepfr.ch/
http://www.phbern.ch
http://www.ph-solothurn.ch
http://www.pfh-gr.ch
Got status code 500 at test_7.pl line 14
linux-wyee:/home/martin/perl #
and the latest version of bodins code some results are looking like that..
Can't call method "addProgressListener" on an undefined value at /usr/lib/perl5/site_perl/5.14.2/WWW/Mechanize/Firefox.pm line 566, <$urls> line 12.
well some minor things left - see above... what can we do with those little errors..
btw: what about the idea of storing the results in a folder... /(called images or so!?)
end of update...
here the inital thread starts - and gives an outline of what is wanted:
i need to have some thumbnails from websites but i tried to use wget - but that does not work for me, since i need some rendering functions what is needet: i have a list of 2,500 URLs, one on each line, saved in a file. Then i want a script - see it below - to open the file, read a line, then retrieve the website and save the image as a small thumbnail.
well since i have a bunch of web-sites (2500) i have to make up my mind about the naming of the results.
http://www.unifr.ch/sfm
http://www.zug.phz.ch
http://www.schwyz.phz.ch
http://www.luzern.phz.ch
http://www.schwyz.phz.ch
http://www.phvs.ch
http://www.phtg.ch
http://www.phsg.ch
http://www.phsh.ch
http://www.phr.ch
http://www.hepfr.ch/
http://www.phbern.ch
So far so good, well i think i try something like this
We also have to close a filehandler if we do not need it anymore. Besides this we can use 'or die' on open. i did it - see below!
Btw we need a good file name. Since i have a huge list of urls then i get a huge list of output files. Therefore i need to have good file names. Can we reflect those things and needs in the programme!?
the script does not start at all ....
#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = new WWW::Mechanize::Firefox();
open(INPUT, "<urls.txt") or die $!;
while (<INPUT>) {
chomp;
next if $_ =~ m/http/i;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s#http://##is;
$name =~s#/##gis;$name =~s#\s+\z##is;$name =~s#\A\s+##is;
$name =~s/^www\.//;
$name .= ".png";
open(my $out, ">",$name) or die $!;
binmode($out);
print $out $png;
close($out);
sleep (5);
}

I came up with this:
while (my $name = <DATA>) {
chomp ($name) ;
#$mech->get($_);
#my $png = $mech->content_as_png();
$name =~ s#http://##; #REMOVE THIS LINE
$name =~s#/#-#gis;
$name =~s#\s+\z##is;$name =~s#\A\s+##is;
$name =~s/^www\.//;
$name .= ".png";
print $name . "\n\n"; #REMOVE THIS LINE
#open(my $out, ">",$name) or die $!;
#binmode($out);
#print $out $png;
#close($out);
#sleep (5);
}
__DATA__
http://www.unifr.ch/sfm
http://www.zug.phz.ch
http://www.schwyz.phz.ch
http://www.luzern.phz.ch
http://www.schwyz.phz.ch
http://www.phvs.ch
http://www.phtg.ch
http://www.phsg.ch
http://www.phsh.ch
http://www.phr.ch
http://www.hepfr.ch/
http://www.phbern.ch
You should be able to modify it for your needs, I commented out all but the regex stuff. I also changed one regec to replace a '/' with a '-' so that there is less probability of falsly generating duplicate URL's.
So that http://www.unifr.ch/sfm will look like this: unifr.ch-sfm
Hope this helps

There are a number of problems with your code. Most significant is the line
next if $_ =~ m/http/i;
which discards all lines from urls.txt that contain http, which isn't what you want.
Rather than go through each problem indicvidually I am offering a functional version. I hope this is satisfactory.
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = new WWW::Mechanize::Firefox();
open my $urls, '<', 'urls.txt' or die $!;
while (<$urls>) {
chomp;
next unless /^http/i;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png;
my $name = $_;
$name =~ s#^http://##i;
$name =~ s#/##g;
$name =~ s/\s+\z//;
$name =~ s/\A\s+//;
$name =~ s/^www\.//;
$name .= ".png";
open my $out, ">", $name or die $!;
binmode $out;
print $out $png;
close $out;
sleep 5;
}

Related

How to use Regex in a While If statement? Perl

I'm new to programming and I've run into an issue. We have to use Perl to write a script that opens a file, then loops through each line using a Regex - then print out the results. The opening of the file and the loop I have, but I can't figure out how to implement the Regex. It outputs 0 matched results, when the assignment outline suggests the number to be 338. If I don't use the Regex, it outputs 2987, which is the total number of lines - which is correct. So there's something incorrect with the Regex I just can't figure out. Any help would be greatly appreciated!
Here's what I have thus far:
use warnings;
use strict;
my $i = 0;
my $filename = 'C:\Users\sample.log.txt';
open (fh, '<', $filename) or die $!;
while(<fh>) {
if ($filename=~ /(sshd)/){
$i++;
}
}
close(fh);
print $i;

Consider this piece of code of yours:
while(<fh>) {
if ($filename=~ /(sshd)/){
$i++;
}
}
You are indeed looping through the file lines, but you keep checking if the file name matches your regex. This is clearly not what you intend.
You meant:
while (my $line = <fh>) {
if ($line =~ /sshd/){
$i++;
}
}
Parentheses around the regex seem superfluous (they are meat to capture, while you are only matching).
Since expression while (<fh>) assigns the content of the line to special variable $_ (which is the default argument for regexp matching), this can be shortened as:
while (<fh>) {
$i++ if /sshd/;
}

OP code has some errors which I've correcte
use warnings;
use strict;
use feature 'say';
my $i = 0;
my $filename = 'C:\Users\sample.log.txt';
open my $fh, '<', $filename
or die "Couldn't open $filename";
map{ $i++ if /sshd/ } <$fh>;
close($fh);
say "Found: $i";

Perl: How to parse through a file and print each line that matches user inputted strings?

I'm new to programming so bear with me. I'm working on a Perl script that asks the user the number of different items they want to search for and what those items are, separating them by pressing ENTER. That part works okay.
Then, the script is to open up a file, parse through, and print each line that matches with the items that the user initially listed. This is the part that I haven't been able to figure out yet. I've tried different variations of the code. I saw many people suggest using the index function but I had no luck with it. It does seem to be working when I swap $line =~ $array for $line =~ /TEXT/. I'm hoping someone here can shed some light.
Thanks in advance!
#!usr/bin/perl
use strict;
use warnings;
my $line;
my $array;
print "Enter number of items: ";
chomp(my $n = <STDIN>);
my #arrays;
print "Enter items, press enter to separate: \n";
for (1..$n) {
my $input = <STDIN>;
push #arrays, $input;
}
open (FILE, "file.txt") || die "can't open file!";
chomp(my #lines = <FILE>);
close (FILE);
foreach $array (#arrays) {
foreach $line (#lines) {
if ($line =~ $array) {
print $line, "\n";
}
}
}

#purplekushbear Welcome to Perl! In Perl, there is more than one way to do it (TIMTOWTDI) so please take this in the spirit of teaching that it is given.
First off your line one -- the #! (sha bang line) is missing the leading / in the path to perl. In Linux/UNIX environments if your script is executable the path after the #! is used to run your program. --- If you do an ls on /usr/bin/perl you should see it. Sometimes it is found at /bin/perl or /usr/local/bin/perl.
When the person mentioned you forgot to chomp they where referring to where you are setting the $input variable. Just chomp like you did for $n and you will be ok.
As for the main part of your program go back and read what you wanted to do and do exactly that might be simpler to do. I think you have a good start on the problem and seem to know that arrays start with a # and scalar variables use the $ sigil, and you use strict which is great.
Here is one way to solve your problem:
#!/usr/bin/perl
use strict;
use warnings;
print "Enter number of items: ";
chomp(my $num = <STDIN>);
my #items = ();
print "Enter items, press enter to separate: \n";
for (1 .. $num)
{
chomp(my $input = <STDIN>);
push #items, $input;
}
open (FILE, "file.txt") || die "can't open file because $!";
while (my $line = <FILE>)
{
foreach my $item (#items)
{
if ($line =~ m/$item/)
{
print $line;
last;
}
}
}
close (FILE);
Notice I used the name #items for your items instead of #arrays which will make understanding the code easier when you come back to it someday. Always write with an eye towards maintainability. Anyways, ask if you have any questions but since I left much of the code the same I don't think you will have much trouble figuring it out. Perldoc and google are your friends. E.g. you can type:
perldoc -f last
to find out how last works. Have fun!

In you script you have forgot to add the chomp while giving the user input, then you need to last the inside for loop when pattern is matched.
Then here is another way,You can try the following, same thing with different method.
I'm making variable name $regex instead of #array. In $regex variable I'm concatenating user input values with | separated. (In regex | behave like or). While concatenating I'm making the quotemeta to escape the special characters. Then I'm making the precompiled regex with qr for $regex variable
#!usr/bin/perl
use strict;
use warnings;
print "Enter number of items: ";
chomp(my $n = <STDIN>);
my $regex;
print "Enter items, press enter to separate: \n";
for (1..$n)
{
chomp(my $input = <STDIN>);
$regex .= quotemeta($input)."|";
}
chop $regex; #remove the last pipe
$regex = qr($regex);
open my $fh,"<", "file.txt" || die "can't open file!";
while(<$fh>)
{
print if(/$regex/i);
}
Then user #ikegami said his comment, you can use the Perl inbuilt #ARGV method instead of STDIN , for example
Your method
my #array = #ARGV;
Another method
my $regex = join "|", map { quotemeta $_ } #ARGV;
Then run the script perl test.pl input1 input2 input3.
And always use 3 arguments to open a file

Using Perl to match all words after a particular word

I am using Perl and need to get all domain names from http://www.malwaredomainlist.com/hostslist/hosts.txt into a flat file.
I think the easiest way to do this is to use a regular expression but I can't get my head around how to build the expression.
my code so far:
#!/usr/bin/perl
use LWP::Simple;
$url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
$content = get $url;
open(my $fh, '>', '/home/jay/feed.txt');
#logic here
}
close $fh;
I'm not sure if I should loop over each line and perform an expression on that or if I should take the whole file as a string and work with that.

The page is just a text/plain document, so I think I would just copy and paste the page into my editor and remove the unwanted information. However if you would prefer a Perl program then this is all that is necessary. It uses LWP::Simple::get to fetch the text page and a regex to search it for lines starting with digits and dots, returning the second field of each
use strict;
use warnings;
use feature 'say';
use LWP::Simple qw/ get /;
my $url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
say for get($url) =~ /^[\d.]+\s+(\S+)/gam;
or as a one-liner
perl -MLWP::Simple=get -E"say for get(shift) =~ /^[\d.]+\s+(\S+)/gam" http://www.malwaredomainlist.com/hostslist/hosts.txt

Unless you have a particular need, iterating by line is the way forward. Otherwise you just tie up memory unnecessarily.
However when you're fetching a url, it's a bit academic - I would suggest that fetching it to a file first isn't a bad thing though, so you can re-process it without needing to refetch.
Given source data sample:
for ( split ( "\n", $content ) ) {
next unless m/^\d/; #skip lines that don't start with a digit.
my ( $IP, $hostname ) = split;
my $domainname = $hostname =~ s/^\w+\.//r;
print $domainname,"\n";
}
This doesn't entirely work with your list though, because in that list you have a mix of hostnames and domain names, and it's not actually all that easy to tell the difference.
After all, the 'tld' at the end might be .com or it might be .org.it

127.0.0.1\s+(.*)
should work fine with global modifier.
Demo

Unless saving the list file locally is a requirement (in which case you might be better off just using wget or curl), there is no need to save it in an external file to process it line-by-line.
You can instead open a filehandle to the string itself.
In the script below, extract_hosts would work the same whether you give it a reference to a string or a filename:
#!/usr/bin/env perl
use strict;
use warnings;
use Carp qw( croak );
use LWP::Simple qw( get );
my $url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
my $malware_hosts = get $url;
unless (defined $malware_hosts) {
die "Failed to get content from '$url'\n";
}
my $hosts = extract_hosts(\$malware_hosts);
print "$_\n" for #$hosts;
sub extract_hosts {
my $src = shift;
open my $fh, '<', $src
or croak "Failed to open '$src' for reading: $!";
my #hosts;
while (my $entry = <$fh>) {
next unless $entry =~ /\S/;
next if $entry =~ /^#/;
my (undef, $host) = split ' ', $entry;
push #hosts, $host;
}
close $fh
or croak "Failed to close '$src': $!";
\#hosts;
}
This will give you the list of hosts.

Code to grep the hostnames from the given file.
use LWP::Simple;
my $url = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';
my $content = get $url;
my #server_names = split(/127\.0\.0\.1\s*/, $content);
open(my $fh, '>', '/home/jay/feed.txt');
print $fh "#server_names";
close $fh;

Here is another implementation. It uses HTML::Tiny which is part of the core so you don't have to install anything.
use HTTP::Tiny;
my $response = HTTP::Tiny->new->get('http://www.malwaredomainlist.com/hostslist/hosts.txt');
die "Failed!\n" unless $response->{success};
my #content;
for my $line ( split ( "\n", $response->{content} ) ){
next if ( $line =~ /^#|^$/);
push #content, ((split ( " ", $line ))[1]);
}
print Dumper (\#content);

System command execution using Perl

I have a Perl script which runs a perforce command and stores the result in a variable $command.
Then it is stored in a file log.txt, and by using a regex the relevant data is taken out.
When I run that command alone the following things pop out:
4680 p4exp/v68 PJIANG-015394 25:34:19 IDLE none
8869 unnamed p4-python R integration semiconductor-project-trunktip turbolinuxclient 01:33:52 IDLE none
8870 unnamed p4-python R integration remote-trunktip-osxclient 01:33:52
The code goes as follows:
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
my $command = qx |p4 monitor show -ale|;
open FH, '>>', "log.txt";
print FH $command;
close FH;
open my $log_fh, '<', '/root/log.txt';
my %stat;
while ($line = <$log_fh>) {
chomp $line;
next if not $line =~ /(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/) {
my $killid_details = $line;
$stat{$killid} = $killid_details;
}
}
close $log_fh;
my $killpro;
foreach my $kill (keys %stat) {
print "$kill\n";
}
The following gets the number 8869 but how to do it without log.txt. Is using an array a better way to do it or hash is fine?
Please correct me as I am still learning.

Seems like your main stumbling block is getting line-by-line input for your loop?
Splitting on newlines should do the trick:
my $killid;
my #lines = split("\n", $command); #split on newlines
for my $line (#lines) {
next if not $line =~ /(\d+)\s+/;
my $id = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/){
$killid = $id;
}
}
One caveat: you mentioned an output of 8870, but I'm getting 8869. The regexps you gave are looking for a line with "integration" and "IDLE none", and for your example input that appears to match 8869.
A hash is fine, though if you're using only one key in it (which seems to be the case), you might as well just use a single variable.

If you assign the result of a qx construct to an array instead of a scalar, then it will be split into lines automatically for you. This code demonstrates.
use strict;
use warnings;
my #lines = qx|p4 monitor show -ale|;
my %stat;
for my $line (#lines) {
chomp $line;
next unless $line =~ /(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/) {
$stat{$killid} = $line;
}
}
print "$_\n" for keys %stat;

How to Circumvent Perl's string escaping the replacement string in s/// when it's read from a file?

This question is similar to my last one, with one difference to make the toy script more similar to my actual one.
Here is the toy script, replace.pl (Edit: now with 'use strict;', etc)
#! /usr/bin/perl -w
use strict;
open(REPL, "<", $ARGV[0]) or die "Couldn't open $ARGV[0]: $!!";
my %replacements;
while(<REPL>) {
chomp;
my ($orig, $new, #rest) = split /,/;
# Processing+sanitizing of orig/new here
$replacements{$orig} = $new;
}
close(REPL) or die "Couldn't close '$ARGV[0]': $!";
print "Performing the following replacements\n";
while(my ($k,$v) = each %replacements) {
print "\t$k => $v\n";
}
open(IN, "<", $ARGV[1]) or die "Couldn't open $ARGV[1]: $!!";
while ( <IN> ) {
while(my ($k,$v) = each %replacements) {
s/$k/$v/gee;
}
print;
}
close(IN) or die "Couldn't close '$ARGV[1]': $!";
So, now lets say I have two files, replacements.txt (using the best answer from the last question, plus a replacement pair that doesn't use substitution):
(f)oo,q($1."ar")
cat,hacker
and test.txt:
foo
cat
When I run perl replace.pl replacements.txt test.txt I would like the output to be
far
hacker
but instead it's '$1."ar"' (too much escaping) but the results are anything but (even with the other suggestions from that answer for the replacement string). The foo turns into ar, and the cat/hacker is eval'd to the empty string, it seems.
So, what changes do I need to make to replace.pl and/or replacements.txt? Other people will be creating the replacements.txt's, so I'd like to make that file as simple as possible (although I acknowledge that I'm opening the regex can of worms on them).
If this isn't possible to do in one step, I'll use macros to enumerate all possible replacement pairs for this particular file, and hope the issue doesn't come up again.

Please don't give us non-working toy scripts that don't use strict and warnings. Because one of the first things people will do in debugging is to turn those on, and you've just caused work.
Second tip, use the 3-argument version of open rather than the 2-argument version. It is safer. Also in your error checking do as perlstyle says (see http://perldoc.perl.org/perlstyle.html for the full advice) and include the file name and $!.
Anyways your problem is that the code you were including was q($1."ar"). When executed this returns the string $1."ar". Get rid of the q() and it works fine. BUT it causes warnings. That can be fixed by moving the quoting into the replace script, and out of the original script.
Here is a fixed script for you:
#! /usr/bin/perl -w
use strict;
open(REPL, "<", $ARGV[0]) or die "Couldn't open '$ARGV[0]': $!!";
my %replacements;
while(<REPL>) {
chomp;
my ($orig, $new) = split /,/;
# Processing+sanitizing of orig/new here
$replacements{$orig} = '"' . $new . '"';
}
close(REPL) or die "Couldn't close '$ARGV[0]': $!";
print "Performing the following replacements\n";
while(my ($k,$v) = each %replacements) {
print "\t$k => $v\n";
}
open(IN, "<", $ARGV[1]) or die "Couldn't open '$ARGV[1]': $!!";
while ( <IN> ) {
while(my($k,$v) = each %replacements) {
s/$k/$v/gee;
}
print;
}
close(IN) or die "Couldn't close '$ARGV[1]': $!";
And the modified replacements.txt is:
(f)oo,${1}ar
cat,hacker

You have introduced one more level of interpolation since the last question.
You can get the right result by either:
Lay a 3rd "e" modifier on your substitution
s/$k/$v/geee; # eeek
Remove a layer of interpolation in replacements.txt by making the first line
(f)oo,$1."ar"

Get rid of the q() in the replacement string;
Should be just
(f)oo,$1."ar"
as in ($k,$v) = split /,/, $_;
Warning: using external input data in evals is very, very dangerous
Or, just make it
(f)oo,"${1}ar"
No modification to the code is necessary either way e.g. s///gee.
Edit #drhorrible, if it doesen't work then you have other problems.
use strict;use warnings;
my $str = "foo";
my $repl = '(f)oo,q(${1}."ar")';
my ($k,$v) = split /,/, $repl;
$str =~ s/$k/$v/gee;
print $str,"\n";
$str = "foo";
$repl = '(f)oo,$1."ar"';
($k,$v) = split /,/, $repl;
$str =~ s/$k/$v/gee;
print $str,"\n";
$str = "foo";
$repl = '(f)oo,"${1}ar"';
($k,$v) = split /,/, $repl;
$str =~ s/$k/$v/gee;
print $str,"\n";
output:
${1}."ar"
far
far

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

WWW::Mechanize::Firefox - allmost there - only a little regex error left - regex

Related

How to use Regex in a While If statement? Perl

Perl: How to parse through a file and print each line that matches user inputted strings?

Using Perl to match all words after a particular word

System command execution using Perl

How to Circumvent Perl's string escaping the replacement string in s/// when it's read from a file?

Categories

Resources