Perl Regex not finding pattern within script - regex

I'm reading the contents of a log file, performing a regex on the lines and putting the results in an array, but for some reason there is no output.
use strict;
use warnings;
my $LOGFILE = "log.file";
my $OUTFILE = "out.file";
open(LOG, "$LOGFILE") or die ("Could not open $LOGFILE: $!\n");
open(TMP, ">", "$OUTFILE") or die ("Could not open $OUTFILE: $!\n");
my #data = (<LOG> =~ /<messageBody>(.*?)<\/messageBody>/sg);
print TMP "This is a test line. \n";
foreach (#data){
print "#data\n";
print "\n=======================\n";
}
close TMP;
close LOG;
My output is a file (out.file) and the only content is "This is a test line." I know the regex works because I tried it at the prompt with:
-lne 'BEGIN{undef $/} while (/(.*?)</messageBody>/sg) {print $1} log.file > test.file
What am I doing wrong?

Your data likely spans lines.
You'll therefore need to slurp the entire thing before using your regex:
my $logdata = do {local $/; <LOG>};
my #data = $logdata =~ m{<messageBody>(.*?)</messageBody>}sg;

If you want to print the data's items, then you should do something like this:
foreach $item (#data){
print TMP "$item\n";
print TMP "\n=======================\n";
}
$item will loop through all array items and will be written in TMP file

Related

Perl: How to parse through a file and print each line that matches user inputted strings?

I'm new to programming so bear with me. I'm working on a Perl script that asks the user the number of different items they want to search for and what those items are, separating them by pressing ENTER. That part works okay.
Then, the script is to open up a file, parse through, and print each line that matches with the items that the user initially listed. This is the part that I haven't been able to figure out yet. I've tried different variations of the code. I saw many people suggest using the index function but I had no luck with it. It does seem to be working when I swap $line =~ $array for $line =~ /TEXT/. I'm hoping someone here can shed some light.
Thanks in advance!
#!usr/bin/perl
use strict;
use warnings;
my $line;
my $array;
print "Enter number of items: ";
chomp(my $n = <STDIN>);
my #arrays;
print "Enter items, press enter to separate: \n";
for (1..$n) {
my $input = <STDIN>;
push #arrays, $input;
}
open (FILE, "file.txt") || die "can't open file!";
chomp(my #lines = <FILE>);
close (FILE);
foreach $array (#arrays) {
foreach $line (#lines) {
if ($line =~ $array) {
print $line, "\n";
}
}
}
#purplekushbear Welcome to Perl! In Perl, there is more than one way to do it (TIMTOWTDI) so please take this in the spirit of teaching that it is given.
First off your line one -- the #! (sha bang line) is missing the leading / in the path to perl. In Linux/UNIX environments if your script is executable the path after the #! is used to run your program. --- If you do an ls on /usr/bin/perl you should see it. Sometimes it is found at /bin/perl or /usr/local/bin/perl.
When the person mentioned you forgot to chomp they where referring to where you are setting the $input variable. Just chomp like you did for $n and you will be ok.
As for the main part of your program go back and read what you wanted to do and do exactly that might be simpler to do. I think you have a good start on the problem and seem to know that arrays start with a # and scalar variables use the $ sigil, and you use strict which is great.
Here is one way to solve your problem:
#!/usr/bin/perl
use strict;
use warnings;
print "Enter number of items: ";
chomp(my $num = <STDIN>);
my #items = ();
print "Enter items, press enter to separate: \n";
for (1 .. $num)
{
chomp(my $input = <STDIN>);
push #items, $input;
}
open (FILE, "file.txt") || die "can't open file because $!";
while (my $line = <FILE>)
{
foreach my $item (#items)
{
if ($line =~ m/$item/)
{
print $line;
last;
}
}
}
close (FILE);
Notice I used the name #items for your items instead of #arrays which will make understanding the code easier when you come back to it someday. Always write with an eye towards maintainability. Anyways, ask if you have any questions but since I left much of the code the same I don't think you will have much trouble figuring it out. Perldoc and google are your friends. E.g. you can type:
perldoc -f last
to find out how last works. Have fun!
In you script you have forgot to add the chomp while giving the user input, then you need to last the inside for loop when pattern is matched.
Then here is another way,You can try the following, same thing with different method.
I'm making variable name $regex instead of #array. In $regex variable I'm concatenating user input values with | separated. (In regex | behave like or). While concatenating I'm making the quotemeta to escape the special characters. Then I'm making the precompiled regex with qr for $regex variable
#!usr/bin/perl
use strict;
use warnings;
print "Enter number of items: ";
chomp(my $n = <STDIN>);
my $regex;
print "Enter items, press enter to separate: \n";
for (1..$n)
{
chomp(my $input = <STDIN>);
$regex .= quotemeta($input)."|";
}
chop $regex; #remove the last pipe
$regex = qr($regex);
open my $fh,"<", "file.txt" || die "can't open file!";
while(<$fh>)
{
print if(/$regex/i);
}
Then user #ikegami said his comment, you can use the Perl inbuilt #ARGV method instead of STDIN , for example
Your method
my #array = #ARGV;
Another method
my $regex = join "|", map { quotemeta $_ } #ARGV;
Then run the script perl test.pl input1 input2 input3.
And always use 3 arguments to open a file

Find-Replace Multiple Occurrences of a string and append iterating number

How can I iterate over the code of an html file and find certain recurring text and then append a word and and iterating number to it.
So:
<!-- TemplateBeginEditable -->
<!-- TemplateBeginEditable -->
<!-- TemplateBeginEditable -->
etc...
Becomes :
<!-- TemplateBeginEditable Event=1 -->
<!-- TemplateBeginEditable Event=2 -->
<!-- TemplateBeginEditable Event=3 -->
etc...
I have tried PERL thinking it would be the easiest/fastest and went to jQuery and then back to PERL.
It seems simple enough to find/replace many ways with REGEX and return an array of the occurrences, but getting the iterating variable tacked on proves to be more of a challenge.
Latest Example of what I have tried:
#!/usr/bin/perl -w
# Open input file
open INPUTFILE, "<", $ARGV[0] or die $!;
# Open output file in write mode
open OUTPUTFILE, ">", $ARGV[1] or die $!;
# Read the input file line by line
while (<INPUTFILE>) {
my #matches = ($_ =~ m/TemplateBeginEditable/g);
### what do I do ith matches array? ###
$_ =~ s/TemplateBeginEditable/TemplateBeginEditable Event=/g;
print OUTPUTFILE $_;
}
close INPUTFILE;
close OUTPUTFILE;
To perform a replacement, you don't need to match the pattern before, you can directly perform the replacement. Example with your code:
while (<INPUTFILE>) {
s/TemplateBeginEditable/TemplateBeginEditable Event=/g;
print OUTPUTFILE $_;
}
Now to add a counter incremented at each replacement, you can put a piece of code in the pattern itself using this syntax:
my $i;
while (<INPUTFILE>) {
s/TemplateBeginEditable(?{ ++$i })/TemplateBeginEditable Event=$i/g;
print OUTPUTFILE $_;
}
To make it shorter you can use the \K feature to change the start of the match result:
while (<INPUTFILE>) {
s/TemplateBeginEditable\K(?{ ++$i })/ Event=$i/g;
print OUTPUTFILE $_;
}
Or with a one-liner:
perl -pe 's/TemplateBeginEditable\K(?{++$i})/ Event=$i/g' file > output
If you have awk available, and the target text only occurs at most once per line, then Perl is overkill I think:
awk 'BEGIN{n=1}{n+=sub("TemplateBeginEditable","& Event="n)}1'
Some explanation: The sub function returns the number of substitutions performed (0 or 1); the & means "whatever matched"; "..."n is string concatenation (no operator in awk); the 1 is a "true" condition that invokes the default "action" of {print}.
Expanding on my one-liner in the comments:
#!/usr/bin/perl
use strict;
use warnings;
my $file = shift or die "Usage: $0 <filename>\n";
open my $fh, '<', $file or die "Cannot open $file: $!\n";
open my $ofh, '>', "$file.modified" or die "Cannot open $file.modified: $!\n";
my $i = 1;
while (my $line = <$fh>) {
if ($line =~ s/TemplateBeginEditable/$& Event=$i/) {
$i++;
}
print $ofh $line;
}
__END__
Note that this assumes you will never have more than one instance of your desired text on a single line, as shown in your sample input.
I'd just do:
local $/=undef;
my $content = <FH>;
my $x = 0;
$content =~ s/(My expected pattern)/$1 . " time=" . (++$x)/ge;

perl split 8gb csv with "," as pattern

I recognise this might be a duplicate but the size of the file I have to split requires a method with doesn't load the csv into memory before processing it. ie I'm looking for a line by line method to read and split and output my file. I I only need my output to be the last 3 field without the quotes and without the thousand delimiting comma.
I have a file of arcGIS coordinates which contain quotes and commas internal to the fields. Data example below.
"0","0","1","1","1,058.83","1,455,503.936","5,173,996.331"
I have been trying to do this using variations on split( '","' , $line);.
Here'e my code.
use strict;
use warnings;
open (FH, '<', "DEM_Export.csv") or die "Can't open file DEM_Export.csv";
open (FH2, '>', "DEM_ExportProcessed.csv") or die "Can't open file DEM_ExportProcessed.csv";
print FH2 "EASTING, NORTHING, ELEVATION,\n";
my $count = 0;
foreach my $line (<FH>) {
chomp;
# if ($count == 0){next;}
print $line, "\n";
my #list = split( '","' , $line);
print "1st print $list[5],$list[6],$list[4]\n";
$list[4] =~ s/,//g;
$list[5] =~ s/,//g;
$list[6] =~ s/,//g;
$list[4] =~ s/"//g;
$list[5] =~ s/"//g;
$list[6] =~ s/"//g;
print "2nd print $list[5],$list[6],$list[4]\n";
if ($count == 10) {
exit;
}
my $string = sprintf("%.3f,%.3f,%.3f\n", $list[5],$list[6],$list[4]);
print FH2 $string;
$count++;
}
close FH;
close FH2;
I'm getting close my my wits end with this and really need a solution.
Any help will be gratefully received.
Cheers
This is really very straightforward using the Text::CSV to handle the nastiness of CSV data
Here's an example, which works fine with the sample data you have shown. As long as your input file is plain ASCII and the rows are about the size you have shown it should work fine
It prints its output to STDOUT, so you'll want to use a command-line redirect to put it into the file you want
use strict;
use warnings 'all';
use Text::CSV;
my $csv_file = 'DEM_Export.csv';
open my $in_fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};
my $csv = Text::CSV->new({ eol => "\n" });
print "EASTING,NORTHING,ELEVATION\n";
while ( my $row = $csv->getline($in_fh) ) {
$csv->print(\*STDOUT, [ map tr/,//dr, #$row[-2,-1,-3] ] );
}
output
1455503.936,5173996.331,1058.83
I guess I should have been braver and had a crack with Text::CSV to start with rather than asking a question.
Many thanks to Сухой27 and choroba for pointing me in the right direction.
Here is the code I ended up with. Probably not the tidiest.
use strict;
use warnings;
use Text::CSV;
my $file = "DEM_Export.csv";
my $file2 = "DEM_ExportProcessed.csv";
open (FH2, '>', $file2) or die "Can't open file $file2: $!";
print FH2 "EASTING, NORTHING, ELEVATION,\n";
print "Starting file processing...\n";
my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
open my $io, "<", $file or die "$file: $!";
while (my $row = $csv->getline ($io)) {
my #fields = #$row;
s/,//g for #fields[3..5];
my $string = sprintf("%.3f,%.3f,%.3f\n", $fields[4],$fields[5],$fields[3]);
print FH2 $string;
}
print "Finished!";
close FH2;
Worked a treat!
Thank you.

Issue with Perl Regex

new perl coder here.
When I copy and paste the text from a website into a text file and read from that file, my perl script works with no issues. When I use getstore to create a file from the website automatically which is what I want, the output is a bunch of |'s.
The text looks identical when I copy and paste, or download the text with getstore.. I'm unable to figure out the problem. Any help would be highly appreciated.
The output that I desire is as follows:
|www\.arkinsoftware\.in|www\.askmeaboutrotary\.com|www\.assculturaleincontri\.it|www\.asu\.msmu\.ru|www\.atousoft\.com|www\.aucoeurdelanature\.
enter code here
Here is the code I am using:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
getstore("http://www.malwaredomainlist.com/hostslist/hosts.txt", "malhosts.txt");
open(my $input, "<", "malhosts.txt");
while (my $line = <$input>) {
chomp $line;
$line =~ s/.*\s+//;
$line =~ s/\./\\\./g;
print "$line\|";
}
The bunch of | you get, is from the unfitting comment-lines at the beginning. So the solution is to ignore all "unfitting" lines.
So instead of
$line =~ s/.*\s+//;
use
next unless $line =~ s/^127.*\s+//;
so you would ignore every line except thos starting with 127.
Here's what I'd do:
my $first = 1;
while (<$input>) {
/^127\.0\.0\.1\s+(.+?)\s*$/ or next;
print '|' if !$first;
$first = 0;
print quotemeta($1);
}
This matches your input in a more precise way, and quotemeta takes care of true regex escaping.
I'd probably go with something like:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
getstore( "http://www.malwaredomainlist.com/hostslist/hosts.txt",
"malhosts.txt" );
open( my $input, "<", "malhosts.txt" );
print join ( "|",
map { m/^\d/ && ! m/localhost/ ?
quotemeta ((split)[1]) : () } <$input> );
Gives:
0koryu0\.easter\.ne\.jp|1\-atraffickim\.tf|10\-trafficimj\.tf|109\-204\-26\-16\.netconnexion\.managedbroadband\.co\.uk|11\-atraasikim\.tf|11\.lamarianella\.info|12\-tgaffickvcmb\.tf| #etc.

Perl Regular expression match from user input

I'm just learning perl and I'm trying to learn regular expressions at the same time. Basically I'm trying to open a log file and print out any lines that match user input to a new file. Using the following code I get no output at all if I type in the word "Clinton". But if I replace
print MYFILE if /\$string\;
with
print MYFILE if /\Clinton\;
it runs as expected. Any ideas? I know it is something simple that I am missing.
print "Enter a word to look up: ";
$string = <>;
print "You put $string";
open(LOG,"u_ex121011.log") or die "Unable to open logfile:$!\n";
open (MYFILE, '>>data2.txt');
while(<LOG>){
print MYFILE if /\Q($string)\E/;
}
close (MYFILE);
close(LOG);
print "Check data2.txt";
In Perl, unlike in some languages, the input operator doesn't silently remove a trailing newline. So your $string is actually "Clinton\n" rather than than "Clinton". To fix it, use the chomp function:
$string = <>;
chomp $string;
print "You put $string\n";
You should also use the 3 argument version of open.
open( my $LOG, '<', 'u_ex121011.log' ) or die "Unable to open file:$!\n";
open http://perldoc.perl.org/functions/open.html
In addition to what ruakh said, you should check if the string is on the line by using the $_ variable and the =~ operator.
print MYFILE "$_\n" if $_ =~ /\Q$string\E/;
Going off of your comment, you can split the line up surprisingly enough using split.
Here is an example of what you could do:
my #lines = split( ' ', $_ );
print MYFILE "$lines[0] $lines[1] $lines[2] $lines[3]\n";
Here is documentation of split: http://perldoc.perl.org/functions/split.html