Screen scraper script won't write to ouptut file - regex

I can't get the Perl script below to write to the file output.html.
I doesn't need to be a CGI script yet, but that is the ultimate intention.
Can anyone tell me why it isn't writing any text to output.html?
#!/usr/bin/perl
#-----------------------------------------------------------------------
# This script should work as a CGI script, if I get it correctly.
# Most CGI scripts for Perl begin with the same line and must be
# stored in your servers cgi-bin directory. (I think this is set by
# your web server.
#
# This scripts will scrape news sites for stories about topics input
# by the users.
#
# Lara Landis
# Sinister Porpoise Computing
# 1/4/2018
# Personal Perl Project
#-----------------------------------------------------------------------
#global_sites = ();
print( "Starting program.\n" );
if ( !( -e "sitedata.txt" ) ) {
enter_site_info( #global_sites );
}
if ( !( -e "scrpdata.txt" ) ) {
print( "scrpdata.txt does not exist. Creating file now.\n" );
print( "Enter the search words you wish to search for below. Press Ctrl-D to finish.\n" );
open( SCRAPEFILE, ">scrpdata.txt" );
while ( $line = <STDIN> ) {
chop( $line );
print SCRAPEFILE ( "$line\n" );
}
close( SCRAPEFILE );
}
print( "Finished getting site data..." );
scrape_sites( #global_sites );
#----------------------------------------------------------------------
# This routine gets information from the user after the file has been
# created. It also has some basic checking to make sure that the lines
# fed to it are legimate domains. This is not an exhaustive list of
# all domains in existence.
#----------------------------------------------------------------------
sub enter_site_info {
my ( #sisites ) = #_;
$x = 1;
open( DATAFILE, ">sitedata.txt" ) || die( "Could not open datafile.\n" );
print( "Enter websites below. Press Crtl-D to finish.\n" );
while ( $x <= #sisites ) {
$sisites[$x] = <STDIN>;
print( "$sisites[$x] added.\n" );
print DATAFILE ( "$sisites[$x]\n" );
$x++;
}
close( DATAFILE );
return #sisites;
}
#----------------------------------------------------------------------
# If the file exists, just get the information from it. Read info in
# from the sites. Remember to create a global array for the sites
# data.
#-----------------------------------------------------------------------
#-----------------------------------------------------------------------
# Get the text to find in the sites that are being scraped. This requires
# nested loops. It starts by going through the loops for the text to be
# scraped, and then it goes through each of the websites listend in the
# sitedata.txt file.
#-----------------------------------------------------------------------
sub scrape_sites {
my ( #ss_info ) = #_;
#gsi_info = ();
#toscrape = ();
$y = 1;
#---------------------------
# Working code to be altered
#---------------------------
print( "Getting site info..." );
$x = 1;
open( DATAFILE, "sitedata.txt" ) || die( "Can't open sitedata.txt.txt\n" );
while ( $gsi_info[$x] = <DATAFILE> ) {
chop( $gsi_info[$x] );
print( "$gsi_info[$x]\n" );
$x++;
}
close( DATAFILE );
open( SCRAPEFILE, "scrpdata.txt" ) || die( "Can't open scrpdata.txt\n" );
print( "Getting scrape data.\n" );
$y = 1;
while ( $toscrape[$y] = <SCRAPEFILE> ) {
chop( $toscrape[$y] );
$y++;
}
close( SCRAPEFILE );
print( "Now opening the output file.\n" );
$z = 1;
open( OUTPUT, ">output.html" );
print( "Now scraping sites.\n" );
while ( $z <= #gsi_info ) { #This loop contains SITES
system( "rm -f index.html.*" );
system( "wget $gsi_info[$z]" );
$z1 = 1;
print( "Searching site $gsi_info[$z] for $toscrape[$z1]\n" );
open( TEMPFILE, "$gsi_info[$z]" );
$comptext = <TEMPFILE>;
while ( $comptext =~ /$toscrape[z1]/ig ) { # This loop fetches data from the search terms
print( "Now scraping $gsi_info[$z] for $toscrape[$z1]\n" );
print OUTPUT ( "$toscrape[$z1]\n" );
$z1++;
}
close( TEMPFILE );
$z++;
}
close( OUTPUT );
return ( #gsi_info );
}

You're making assumptions about the current work directory that are often incorrect. You seem to assume the current work directory is the directory in which the script resides, but that's never guaranteed, and it's often / for CGI scripts.
"sitedata.txt"
should be
use FindBin qw( $RealBin );
"$RealBin/sitedata.txt"
There could also be a permission error. You should include the error cause ($!) in your error message when open fails so you know what is causing the problem!

While you're checking some, you're not checking all of your open or system calls. If they fail, the program will keep going without an error message telling you why.
You can add checks to all of these, but it's easy to forget. Instead, use autodie to do the checks for you.
You'll also want to use strict to ensure you haven't made any variable typos, and use warnings to warn you about small mistakes. See this answer for more.
Also #global_sites is empty so enter_site_info() isn't going to do anything. And scrape_sites() does nothing with its argument, #ss_info.

All of these things are helpful. Thank you. I found the problem. I was opening the wrong file. It was putting the error-checking in on the file that let me spot the error. It should have been
open (TEMPFILE, "index.html") || die ("Cannot open index.html\n");
I have taken as many of the suggestions as I remembered and included them in the code. I still need to implement the directory advice, but it should not be difficult.

Related

remove msisdn element based on regex

I'm trying to remove msisdn field from MO calls on TAP3.11, but it doesn't handle the needed.
I want to set a condition, if the Msisdn doesn't start with 962 then remove the element.
My background is only with python, this's the first time with perl. I use it because after searching I believe that only perl can handle TAP files.
# Will scan all the calls for MTC's.
foreach $key ( #{$struct->{'transferBatch'}->{'callEventDetails'} } ) {
foreach ( keys %{$key} ) {
if ( $_ eq "mobileOriginatedCall" )
{
if ( defined $key->{$_}->{'basicCallInformation'} )
{
if ( defined $key->{$_}->{'basicCallInformation'}->{'chargeableSubscriber'} )
{
if ( defined $key->{$_}->{'basicCallInformation'}->{'chargeableSubscriber'}->{'simChargeableSubscriber'})
{
if ( defined $key->{$_}->{'basicCallInformation'}->{'chargeableSubscriber'}->{'simChargeableSubscriber'}->{'msisdn'})
{
if ($key->{$_}->{'basicCallInformation'}->{'chargeableSubscriber'}->{'simChargeableSubscriber'}->{'msisdn'} =~ /^[962]/)
{
$key->{$_}->{'basicCallInformation'}->{'chargeableSubscriber'}->{'simChargeableSubscriber'}->{'msisdn'}=();
}
}
}
}
}
}
}
}
Try with:
...
if ($key->{$_}->{'basicCallInformation'}->{'chargeableSubscriber'}->{'simChargeableSubscriber'}->{'msisdn'} =~ /^(?!962)/)
{
delete $key->{$_}->{'basicCallInformation'}->{'chargeableSubscriber'}->{'simChargeableSubscriber'}->{'msisdn'};
}
The changes:
For deleting a key, use delete
For a "not starting with" regex, use: ^(?!WHATEVER), for example ^(?!962)

How to pass project number to manifest.json

I am building a web push WordPress plugin and I want to pass project number from form input field to manifest.json file
which is included in index.php as
<link rel="manifest" href="/manifest.json">
Disclaimer: I'm the author of this plugin.
Instead of building your own from scratch, you could contribute to the already existing https://github.com/mozilla/wp-web-push.
If you want to build your own, you can check the source of that plugin out to see how we have implemented it.
We've built a class to handle it: https://github.com/marco-c/wp-web-app-manifest-generator.
You cannot pass any param to the manifest.json. You must genarate it as a static file when the form is submitted.
Here's the code that we have used for the Pushpad plugin:
if (file_exists ( ABSPATH . 'manifest.json' )) {
$oldManifestJson = file_get_contents ( ABSPATH . 'manifest.json' );
} else {
$oldManifestJson = '{}';
}
$data = json_decode ( $oldManifestJson, true );
$data ['gcm_sender_id'] = $settings ['gcm_sender_id'];
$data ['gcm_user_visible_only'] = true;
$newManifestJson = json_encode ( $data );
if ( is_writable ( ABSPATH . 'manifest.json' ) || !file_exists ( ABSPATH . 'manifest.json' ) && is_writable ( ABSPATH ) ) {
file_put_contents ( ABSPATH . 'manifest.json', $newManifestJson );
} else {
// display an error
}

Use of uninitialized value in pattern match

I have the following code in a subroutine in Perl for which I keep getting the following error :
Use of uninitialized value $nextLine in pattern match (m//) at catlist.pl line 67, line 2756.
sub extract_testdesc {
my #str = #_;
my $file = $_[0];
my $testname = $_[1];
my #fifo;
# Open the file
open( FILEHANDLE, $file ) or die "couldnt open";
while (<FILEHANDLE>) {
if ( $_ =~ m/\/\*\*/ ) { # if start of comment /**
undef(#fifo);
$nextLine = <FILEHANDLE>;
while ( $nextLine !~ m/\*\// ) { # Add all lines into array until */ is encountered
if ( $nextLine !~ m/\#testlogic.author/ ) {
$nextLine =~ s/\*//g;
if ( $nextLine ne "" ) {
push( #fifo, $nextLine );
}
}
$nextLine = <FILEHANDLE>;
}
}
if ( $_ =~ m/$testname/ ) {
return (#fifo);
}
}
close(FILEHANDLE);
}
What am I doing wrong ? I'm new to Perl so any help is appreciated.
Whenever you use a while loop on a file handle, it's actually synonymous with while (defined($_ = <FILEHANDLE>)) {. This is useful because once the filehandle reaches the eof, it will exit the loop. On the other hand, you are doing manual calls readline calls without testing to see if anything was returned, hence your uninitialized value warnings.
Overall, your goal and logic are confusing. However, perhaps an introduction to the range operator will help you? The following achieves what I think you logic is, but I easily could have misinterpreted.
sub extract_testdesc {
my ($file, $testname) = #_;
my #fifo;
# Open the file
open my $fh, '<', $file or die "couldnt open: $!";
while (<$fh>) {
if ( my $range = m{\Q/**} .. m{\Q*/}) {
#fifo = () if $range == 1;
push #fifo, $_;
} elsif ($_ =~ m/\Q$testname\E/ ) {
return (#fifo);
}
}
close($fh);
}

Qt Save screenshot - naming problems

I want to save the screenshot of my application to the desktop. The problem is, it saves but if I take another screenshot it replaces the old image. How can I tell Qt to call it Wishlist 1, Wishlist 2 etc.?
QString filepath = QDir::toNativeSeparators( QDir::homePath() + "/Desktop/Wishlist.png" );
if( grab().save( filepath, "png" ) )
statusBar()->showMessage( tr("Saved file to Desktop.") );
else
statusBar()->showMessage( tr("Error saving file.") );
After I close and start the program again, it should be able to go on. E.g. Wishlist 1, Wishlist 2 then restart and then it should name the next screenshot Wishlist 3.
create a function for resolving the filename. the following snippet is not safe (what if no get's too large) and for the no == 0 you need a special case.
int no = 0;
while( true ){
QString path = filename + "." + QString::number( no ) + "." + extention;
QFileInfo fileInfo( path );
if( !fileInfo.exists() )
return path;
no++
}

Batch file complicated if sequence returns errors.

I am trying to have users login and when they do it sets their power or rank. The ranks are Admin User and Guest. I need to do the code below for every command in the program. It returns ) was not expected at this time. Any idea why? For this command all the users should be able to access it but later I will need to set certain things for each group.
if %inputCommand%==/help (
if %power%==Admin (
goto helpInfo
) else (
if %power%==User (
goto helpInfo
) else (
if %power%==Guest (
) else (
goto powerReadFailed
)
)
)
) else (
goto readInputCommandTwo
)
Try this:
if "%inputCommand%"=="/help" (
if "%power%"=="Admin" (
goto helpInfo
) else (
if "%power%"=="User" (
goto helpInfo
) else (
if "%power%"=="Guest" (
rem
) else (
goto powerReadFailed
)
)
)
) else (
goto readInputCommandTwo
)
Why so complicated?
how about
for %%a in (admin user guest) do if /i "%power"=="%%a" (
for %%b in (help somethingelse whatever) do if /i "%inputcommand"=="%%b" (
goto %%a%%b
)
goto %%ainvalidcommand
)
:powerreadfailed
you can then set up labels
adminhelp userhelp guesthelp
adminsomethingelse usersomethingelse guetsomethingelse
...
It's perfectly legitimate to write
:adminhelp
:userhelp
echo admin and user get the same help
and adding a new user-class or command isn't hard...
( IF /i means make a case-insensitive comparison)