Is it possible to match regex using variable? - regex

Here is my code
my $filename = 'text.log';
my $items = "donkey";
open(my $fh, '<:encoding(UTF-8)', $filename) or die "Cant open";
while (my $contents = <$fh>)
{
print "$contents";
if ( $items =~m/$contents/)
{ print "Found $contents";}
else { print "NOTHING\n";}
}

Yes, but you'll need to remove the trailing newspace on each line ($contents =~ s/\n$//;):
#!/usr/bin/env perl
my $filename = 'text.log';
my $items = "donkey";
open(my $fh, '<:encoding(UTF-8)', $filename) or die "Cant open";
while (my $contents = <$fh>) {
print "$contents";
$contents =~ s/\n$//;
if ($items =~ m/$contents/) {
print "Found $contents\n";
} else {
print "NOTHING\n";
}
}
Test:
$ cat text.log
test
ok
donk
$ ./test.pl
test
NOTHING
ok
NOTHING
donk
Found donk

Related

Perl reading in a file and getting a string in between two strings

I am trying to read in a file and gather everything in between two hash keys. I want to access everything between the $beginString and $endString variables. I have tried multiple regular expressions but haven't been able to get one to work.
my $beginString = "SEARCH";
my $endString = "TEST";
my $fileContent;
open(my $fileHandler, $inputFile) or die "Could not open file '$inputFile' $!";
{
local $/;
$fileContent = <$fileHandler>;
}
close($fileHandler);
if($fileContent =~ /\b$beginString\b(.*?)\b$endString\b/){
my $result = $1;
print $result;
}
print Dumper($fileContent);
An adaptation of the perl monks' solution could be..
my $beginString = "SEARCH";
my $endString = "TEST";
my $fileContent;
open(my $fileHandler, $inputFile) or die "Could not open file '$inputFile' $!";
while(<$fileHandler>) {
if(/$beginString/../$endString/) { $fileContent .= $_ unless(/$beginString/ or /$endString/) }
}
close($fileHandler);
print Dumper($fileContent);

Perl: Comparing the contents of one file with those of several others

I have to read a CSV file (TEST.csv) with contents of this sort:
Sl.No, Label, Customer1, Customer2, Customer3...
1, label1, Y, N, Y...
2, label2, N, Y, Y...
...
and retrieve only the labels marked as "Y" for every "customer", into an external file for that "customer". With some help from SO members in another question, I managed to desist from getting lost in a maze of nested data structures, and am using the construct below. Here, I'm copying the labels marked as "Y" to a temp file _temp.h for the corresponding customers.
Now, the actual "external file" I need to write does not just have the labels, but is a copy of an "internal file" internal.h, which has data in this form:
/*...comments*/
#define header_label1 val1;
#define header_label2 val2;
...
For example, I might have a line #define ABC_Comp1_X_H_CompDes1 value. If the label Comp1_CompDes1 is present in the temp file I create for customer 1, then the line above is copied into the final external file for customer 1.
The following code is the one I'm using. However, this throws an error "Global symbol "%tempLines" requires explicit package name" for the line marked "HERE", though I'm not using a hash, and also of syntax errors in the next couple of lines w.r.t. the curly braces.
Any guidance as to the reason behind these errors would be highly appreciated.
use strict;
use warnings;
use File::Slurp;
use Data::Dumper;
my $numCustomers;
my $intHeaderFile = "internal.h";
open(my $fh, "<", "TEST.csv") or die "Unable to open CSV, $!";
open(my $infh, "<", $intHeaderFile) or die "Cannot open $intHeaderFile, $!";
my #headerLines = read_file($intHeaderFile);
chomp( my $header = <$fh> );
my #names = split ",", $header;
$numCustomers = scalar(#names) - 2;
print "\nNumber of customers : $numCustomers\n";
my #customerNames;
for(my $i = 0; $i < $numCustomers; $i++)
{
push #customerNames, $names[$i + 2];
}
my #tempHandles;
my #handles;
my #tempfiles;
my #files;
for(my $i = 0; $i < $numCustomers; $i++)
{
my $custFile = "customer".$i."_external.h";
open my $fh, '>', $custFile or die "$custFile: $!";
push #handles, $fh;
push #files, $custFile;
my $tempFile = "customer".$i."_temp.h";
open my $fh1, '+>', $tempFile or die "$tempFile: $!";
push #tempHandles, $fh1;
push #tempfiles, $tempFile;
}
while (<$fh>)
{
chomp;
my $nonIncLine = $_;
my #fields = split ",", $nonIncLine;
next if $. == 1;
for(my $i = 0; $i < $numCustomers; $i++)
{
print { $tempHandles[$i] } $fields[1], "\n" if 'Y' eq $fields[ $i + 2 ];
}
}
for(my $i = 0; $i < $numCustomers; $i++)
{
my #tempLines = read_file($tempfiles[$i]);
print #tempLines;
foreach my $headerLine(#headerLines)
{
if (grep { $headerLine =~ /$_/} #tempLines ) #HERE
{
print { $handles[$i] } $headerLine, "\n";
}
}
unlink($tempfiles[$i]);
}

How to compare the data of two files (.xml and .html) using perl(regex)?

I tried using regex by using foreach loop for .xml files as there are many .xml files and only one .html file. I opened, read and closed the directory. But when it comes to searching for a particular pattern in both the files, the code doesn't enter the while/if loop.
xml data: #pattern in xml format
<gname>abc</gname>
<pname>xyz</pname>
html data: #pattern in html format
<p>ABC</p>
<p><i>xyz</i></p>
Here, I need to match abc and xyz in both xml and html file (case sensitive).
open( F2, "<F2>" );
my $xml_list1 = "(.*)\.html";
here the data enclosed inside the parentheses also appears when printed. I want say the file name is abc.html so i want to keep "abc" as interchangeable, so that i dont need to write/modify the code if any filename other than abc.html occurs.
close F2;
#print $xml_list1."\n";
foreach my $f (#filenames) {
#print $f."\n";
open( F1, "<F1>" );
my $data = join( "", <F1> );
close F1;
my $filename = substr( $f, 0, index( $f, '.' ) );
my $xml_list = $filename . ".xml";
while ( $xml_list =~ m//ig ) {
...;
}
}
the code doesn't enter the while/if loop, seems that it finds some error in reading the filename $xml_list.
I want to match both the data without the use of parsers.
Can someone please help me to out.
UPDATE:
CODE:
#!/usr/bin/perl
use strict;
use Cwd;
use File::Copy;
use File::Basename;
my $path1=getcwd;
opendir(INP, "$path1\/Input");
my #out = grep(/.(xml)$/,readdir(INP));
my #out1 = grep(/.(html)$/,readdir(INP));
close INP;
foreach my $final(#out)
{
my $filetobecopied = "Input\/".$final;
my $newfile = $final;
copy($filetobecopied, $newfile) or die "File cannot be copied.";
}
foreach my $final1(#out1)
{
my $filetobecopied1 = "Input\/".$final1;
my $newfile1 = $final1;
copy($filetobecopied1, $newfile1) or die "File cannot be copied.";
}
opendir DIR, $path1 or die "cant open dir";
my #files = grep /(.*?)\.(xml)$/,(readdir DIR);
my #files1 = grep /(.*?)\.(html)$/,(readdir DIR);
closedir DIR;
open(F6, ">Ref.txt");
print F6 "FileName\tError Instance\tOutput\n";
# open(F2,"<F2>");
# my $xml_list1="abc.html";
# my $data1=join("",<F1>);
# my $xml_list2=$xml_list1;
foreach my $f(#files)
{
open(F1, "<$f") or die "Cannot open file: $files[0]";
my $data=join("", <F1>);
close F1;
my $xml_list=$data;
#print "$f\n";
open(F2, "<$f") or die "Cannot open file: $files[0]";
my $xml_listt="abc.html";
my $data1=join("", <F2>);
my $xml_list1=$data1;
print $xml_list1."\n";
while($xml_list=~m/(<personName>(.*?)<\/personName>)/isg)
{
my $full=$1;
my $name=$2;
#print F6 $f."\t".$full."\n";
if($full=~m/(<givenNames>(\w+)<\/givenNames>(\n)?<familyName>(\w+)<\/familyName>(\n)?(.*?))/isg)
{
my $fg=$1;
my $gname=$2;
my $fname=$4;
#print F6 $f."\t".$gname."\t".$fname."\n";
}
}
While($xml_list1=~m/(<p><FONT FACE="(.*?)" SIZE="(\d+)"><I>(.*?)<\/I><\/FONT><\/p>)/igs)
{
my $hfull=$1;
print F6 $f."\n"; #.$hfull."\n";
}
close F2;
close F1;
}
foreach my $del(#files)
{
unlink $del;
}
#flora : This is the final solution i am posting it . Here from xml files i have combined with and matched with data in the html . Now i am passing an argument to the program only one pattern that i want to check whether it matched with xml and html files . Like for example i have passed the argument "Kazumitsu Sugiura" to the program and now the program takes the value from and from xml and combines it as "Kazumitsu Sugiura" . This pattern will now check in the html file and if it matches then the filenames will be displayed as shown below :
InputFile:(sample.xml)
<creators>
<creator affiliationRef="#01" creatorRole="author" xml:id="01">
<personName><givenNames>Kazumitsu</givenNames><familyName>Sugiura</familyName></personName>
</creator>
<creator affiliationRef="#01" creatorRole="author" xml:id="02">
<personName><givenNames>Yoshinao</givenNames><familyName>Muro</familyName></personName>
</cre??ator>
<creator affiliationRef="#01" creatorRole="author" xml:id="03">
<personName><givenNames>Masashi</givenNames><familyName>Akiyama</familyName></personName>
</creator>
</creators>
InputFile:(test.xml)
<creators>
<creator affiliationRef="#01" creatorRole="author" xml:id="01">
<personName><givenNames>Kazumitsu</givenNames><familyName>Sugiura</familyName></personName>
</creator>
</creators>
InputFile:(test.html)
<P><FONT FACE="hello" SIZE="14"><I>Kazumitsu SUGIURA, Yoshinao Muro, and Masashi Akiyama</I></FONT></P>
Code:
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use File::Copy;
use File::Basename;
my $path1=getcwd;
my $PatternName = $ARGV[0];
opendir DIR, $path1 or die "cant open dir";
my #files = grep /(.*?)\.(xml)$/,(readdir DIR);
closedir DIR;
opendir DIR, $path1 or die "cant open dir";
my #files1 = grep /(.*?)\.(html)$/,(readdir DIR);
closedir DIR;
#print #files1;
open(F6, ">Ref.txt");
print F6 "FileName\tMatchedString\tOutput\n";
foreach my $f (#files)
{
open(F1, "<$path1\/$f") or die "Cannot open file: $f - $!";
my $data=join("", <F1>);
close F1;
my $xml_list=$data;
#print $xml_list;
foreach my $f1 (#files1)
{
my #fname=();
my #hfull=();
#print $f1 . "\n";
open(F2, "<$path1\/$f1") or die "Cannot open file: $f1 - $!";
my $data1=join("", <F2>);
close F2;
my $xml_list1=$data1;
#print $xml_list1;
while($xml_list =~ m/(<personName>(.*?)<\/personName>)/isg)
{
my $full=$1;
#print $full . "\n";
if($full =~ m/(<givenNames>\s*(\w+)\s*<\/givenNames>\s*<familyName>\s*(\w+)\s*<\/familyName>\s*(.*?))/isg)
{
my $var = "$2 $3";
push(#fname,$var);
}
}
while($xml_list1 =~ m/(<p><FONT FACE="(.*?)" SIZE="(\d+)"><I>(.*?)<\/I><\/FONT><\/p>)/isg)
{
push(#hfull,$4);
}
foreach my $a (#fname)
{
if($a =~ /$PatternName/i)
{
foreach my $b (#hfull)
{
if($b =~ m/$PatternName/isg)
{
print $PatternName . "\n";
my $line = substr($b,index(lc($b),lc($PatternName)),length($PatternName));
print F6 "Matched $a($f)\->$line($f1)\n";
}
}
}
}
}
}
Program Execution :
perl filename.pl "Kazumitsu Sugiura"
Output:
Matched Kazumitsu Sugiura(sample.xml)->Kazumitsu SUGIURA(abc.html)
Matched Kazumitsu Sugiura(test.xml)->Kazumitsu SUGIURA(abc.html)
#flora: I have modified your program and also i have put the optimized solution for your program.
Modified Code :(Modification to your program)
#!/usr/bin/perl
use strict;
use Cwd;
use File::Copy;
use File::Basename;
my $path1=getcwd;
opendir(INP, "$path1\/Input");
my #out = grep(/\.(xml)$/,readdir(INP));
my #out1 = grep(/\.(html)$/,readdir(INP));
close INP;
foreach my $final(#out)
{
my $filetobecopied = "Input\/".$final;
my $newfile = $final;
copy($filetobecopied, $newfile) or die "File cannot be copied.";
}
foreach my $final1(#out1)
{
my $filetobecopied1 = "Input\/".$final1;
my $newfile1 = $final1;
copy($filetobecopied1, $newfile1) or die "File cannot be copied.";
}
opendir DIR, $path1 or die "cant open dir";
my #files = grep /(.*?)\.(xml)$/,(readdir DIR);
my #files1 = grep /(.*?)\.(html)$/,(readdir DIR);
closedir DIR;
open(F6, ">Ref.txt");
print F6 "FileName\tError Instance\tOutput\n";
# open(F2,"<F2>");
# my $xml_list1="abc.html";
# my $data1=join("",<F1>);
# my $xml_list2=$xml_list1;
foreach my $f (#files)
{
open(F1, "<$f") or die "Cannot open file: $f";
my $data=join("", <F1>);
close F1;
my $xml_list=$data;
open(F2, "<$path1\/Input\/abc.html") or die "Cannot open file: abc.html - $!";
my $data1=join("", <F2>);
close F2;
my $xml_list1=$data1;
print $xml_list1."\n";
while($xml_list =~ m/(<personName>(.*?)<\/personName>)/isg)
{
my $full=$1;
my $name=$2;
#print F6 $f."\t".$full."\n";
if($full =~ m/(<givenNames>\s*(\w+)\s*<\/givenNames>\s*<familyName>\s*(\w+)\s*<\/familyName>\s*(.*?))/isg)
{
my $fg=$1;
my $gname=$2;
my $fname=$3;
#print F6 $f."\t".$gname."\t".$fname."\n";
}
}
while($xml_list1 =~ m/(<p><FONT FACE="(.*?)" SIZE="(\d+)"><I>(.*?)<\/I><\/FONT><\/p>)/isg)
{
my $hfull= $1;
print F6 $f."\n"; #.$hfull."\n";
}
}
foreach my $del(#files)
{
unlink $del;
}
Optimized solution:
#!/usr/bin/perl
use strict;
use warnings;
my #files = grep {-f} glob("*.xml");
my #files1 = grep {-f} glob("*.html");
open(F6, ">Ref.txt");
print F6 "FileName\tError Instance\tOutput\n";
foreach my $f (#files)
{
my $xml_list = do {
local $/ = undef;
open my $fh,'<',"$f" or die "Cannot open file: $f";
<$fh>;
};
my $xml_list1 = do {
local $/ = undef;
open my $fh,'<',"abc.html" or die "Cannot open file: $f";
<$fh>;
};
print $xml_list1."\n";
while($xml_list =~ m/(<personName>(.*?)<\/personName>)/isg)
{
my $full=$1;
my $name=$2;
#print F6 $f."\t".$full."\n";
if($full =~ m/(<givenNames>\s*(\w+)\s*<\/givenNames>\s*<familyName>\s*(\w+)\s*<\/familyName>\s*(.*?))/isg)
{
my $fg=$1;
my $gname=$2;
my $fname=$3;
#print F6 $f."\t".$gname."\t".$fname."\n";
}
}
while($xml_list1 =~ m/(<p><FONT FACE="(.*?)"\s+SIZE="(\d+)"><I>(.*?)<\/I><\/FONT><\/p>)/isg)
{
my $hfull= $1;
print F6 $f . "\n"; #.$hfull."\n";
}
}
foreach my $del(#files)
{
unlink $del;
}
#flora : Try this code . I was busy with other work so couldn't reply soon. Now this code will generate following output :
Input File:(sample.xml)
<creators>
<creator affiliationRef="#01" creatorRole="author" xml:id="01">
<personName><givenNames>Kazumitsu</givenNames><familyName>Sugiura</familyName></personName>
</creator>
<creator affiliationRef="#01" creatorRole="author" xml:id="02">
<personName><givenNames>Yoshinao</givenNames><familyName>Muro</familyName></personName>
</cre??ator>
<creator affiliationRef="#01" creatorRole="author" xml:id="03">
<personName><givenNames>Masashi</givenNames><familyName>Akiyama</familyName></personName>
</creator>
</creators>
InputFile(abc.html):
<P><FONT FACE="hello" SIZE="14"><I>Kazumitsu SUGIURA, Yoshinao Muro, and Masashi Akiyama</I></FONT></P>
Output:
FileName MatchedString Output
Matched Sugiura(sample.xml)->SUGIURA(abc.html)
Matched Muro(sample.xml)->Muro(abc.html)
Matched Akiyama(sample.xml)->Akiyama(abc.html)
Code:
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use File::Copy;
use File::Basename;
my $path1=getcwd;
#print $path1;
opendir(INP, "$path1\/Input");
my #out = grep(/\.(xml)$/,readdir(INP));
closedir INP;
opendir(INP, "$path1\/Input");
my #out1 = grep(/\.(html)$/,readdir(INP));
#print #out1;
closedir INP;
foreach my $final(#out)
{
my $filetobecopied = "Input\/".$final;
my $newfile = $final;
copy($filetobecopied, $newfile) or die "File cannot be copied.";
}
foreach my $final1(#out1)
{
my $filetobecopied1 = "Input\/".$final1;
my $newfile1 = $final1;
#print $final1;
copy($filetobecopied1, $newfile1) or die "File cannot be copied.";
}
opendir DIR, $path1 or die "cant open dir";
my #files = grep /(.*?)\.(xml)$/,(readdir DIR);
closedir DIR;
opendir DIR, $path1 or die "cant open dir";
my #files1 = grep /(.*?)\.(html)$/,(readdir DIR);
closedir DIR;
#print #files1;
open(F6, ">Ref.txt");
print F6 "FileName\tMatchedString\tOutput\n";
# open(F2,"<F2>");
# my $xml_list1="abc.html";
# my $data1=join("",<F1>);
# my $xml_list2=$xml_list1;
foreach my $f (#files)
{
open(F1, "<$path1\/Input\/$f") or die "Cannot open file: $f - $!";
my $data=join("", <F1>);
close F1;
my $xml_list=$data;
#print $xml_list;
foreach my $f1 (#files1)
{
my #fname=();
my #hfull=();
print $f1 . "\n";
open(F2, "<$path1\/Input\/$f1") or die "Cannot open file: $f1 - $!";
my $data1=join("", <F2>);
close F2;
my $xml_list1=$data1;
#print $xml_list1;
while($xml_list =~ m/(<personName>(.*?)<\/personName>)/isg)
{
my $full=$1;
#print $full . "\n";
if($full =~ m/(<givenNames>\s*(\w+)\s*<\/givenNames>\s*<familyName>\s*(\w+)\s*<\/familyName>\s*(.*?))/isg)
{
push(#fname,$3);
}
}
while($xml_list1 =~ m/(<p><FONT FACE="(.*?)" SIZE="(\d+)"><I>(.*?)<\/I><\/FONT><\/p>)/isg)
{
push(#hfull,$4);
}
foreach my $a (#fname)
{
foreach my $b (#hfull)
{
#print $b . "\n";
if($b =~ m/$a/isg)
{
my $line = substr($b,index(lc($b),lc($a)),length($a));
print F6 "Matched $a($f)\->$line($f1)\n";
}
}
}
}
}
foreach my $del(#files)
{
unlink $del;
}
Optimized code:
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use File::Copy;
use File::Basename;
my $path1=getcwd;
#print $path1;
#opendir(INP, "$path1\/Input");
#my #out = grep(/\.(xml)$/,readdir(INP));
#closedir INP;
#opendir(INP, "$path1\/Input");
#my #out1 = grep(/\.(html)$/,readdir(INP));
#print #out1;
#closedir INP;
#foreach my $final(#out)
#{
#my $filetobecopied = "Input\/".$final;
# my $newfile = $final;
#copy($filetobecopied, $newfile) or die "File cannot be copied.";
#}
#foreach my $final1(#out1)
#{
# my $filetobecopied1 = "Input\/".$final1;
# my $newfile1 = $final1;
#print $final1;
# copy($filetobecopied1, $newfile1) or die "File cannot be copied.";
#}
opendir DIR, $path1 or die "cant open dir";
my #files = grep /(.*?)\.(xml)$/,(readdir DIR);
closedir DIR;
opendir DIR, $path1 or die "cant open dir";
my #files1 = grep /(.*?)\.(html)$/,(readdir DIR);
closedir DIR;
#print #files1;
open(F6, ">Ref.txt");
print F6 "FileName\tMatchedString\tOutput\n";
# open(F2,"<F2>");
# my $xml_list1="abc.html";
# my $data1=join("",<F1>);
# my $xml_list2=$xml_list1;
foreach my $f (#files)
{
open(F1, "<$path1\/$f") or die "Cannot open file: $f - $!";
my $data=join("", <F1>);
close F1;
my $xml_list=$data;
#print $xml_list;
foreach my $f1 (#files1)
{
my #fname=();
my #hfull=();
print $f1 . "\n";
open(F2, "<$path1\/$f1") or die "Cannot open file: $f1 - $!";
my $data1=join("", <F2>);
close F2;
my $xml_list1=$data1;
#print $xml_list1;
while($xml_list =~ m/(<personName>(.*?)<\/personName>)/isg)
{
my $full=$1;
#print $full . "\n";
if($full =~ m/(<givenNames>\s*(\w+)\s*<\/givenNames>\s*<familyName>\s*(\w+)\s*<\/familyName>\s*(.*?))/isg)
{
push(#fname,$3);
}
}
while($xml_list1 =~ m/(<p><FONT FACE="(.*?)" SIZE="(\d+)"><I>(.*?)<\/I><\/FONT><\/p>)/isg)
{
push(#hfull,$4);
}
foreach my $a (#fname)
{
foreach my $b (#hfull)
{
#print $b . "\n";
if($b =~ m/$a/isg)
{
my $line = substr($b,index(lc($b),lc($a)),length($a));
print F6 "Matched $a($f)\->$line($f1)\n";
}
}
}
}
}
#foreach my $del(#files)
#{
# unlink $del;
#}

perl: passing subroutines rexexp replace with search results

i have the following perl subroutine:
sub rep {
defined ($filein = shift) || die ("no filein");
defined ($fileout = shift) || die ("no fileout");
$look = shift;
$replace = shift;
open (infile, "$filein")|| die;
open (outfile, "> $fileout")|| die;
while (<infile>) {
s/$look/$replace/g;
print outfile;
}
(close the files)
}
and the following text:
kuku(fred) foo(3)
kuku(barney) foo(198)
i want to call it with the following structures:
$look = kuku\((\w+)\) foo \((\d+)\),
$replace = gaga\(($1)\) bar\(($2)\).
but when i called the sub with the following (and it's variations), i couldn't make it accept the $1, $2 format:
&rep ($ARGV[0], $ARGV[1],
"kuku\\(\(\\w+\)\\) foo \\(\(\\d+\)\\)" ,
"gaga\\(\(\$1\)\\) bar\\(\(\$2\)\\)");
all i get is:
gaga($1) bar($2)
gaga($1) bar($2)
what am i doing wrong?
how can i make the subroutine identify the $1\ $2 (...) as the search results of the search and replace?
I'm not sure if substitution part in regex can be set in a way you want it without using eval /e, so this is how I would write this.
qr// parameter is real regex, followed by callback in which $_[0] is $1
rep( $ARGV[0], $ARGV[1], qr/kuku\((\w+)\) foo \((\d+)\)/, sub { "gaga($_[0]) bar($_[1])" } );
sub rep {
my ($filein, $fileout, $look, $replace) = #_;
defined $filein or die "no filein";
defined $fileout or die "no fileout";
open (my $infile, "<", $filein) or die $!;
open (my $outfile, ">", $fileout) or die $!;
while (<$infile>) {
s/$look/$replace->($1,$2)/ge;
print $outfile;
}
# (close the files)
}
This could be even more simplified by just passing callback which would change $_.
rep( $ARGV[0], $ARGV[1], sub { s|kuku\((\w+)\) foo \((\d+)\)|gaga($1) bar($2)| } );
sub rep {
my ($filein, $fileout, $replace) = #_;
defined $filein or die "no filein";
defined $fileout or die "no fileout";
open (my $infile, "<", $filein) or die $!;
open (my $outfile, ">", $fileout) or die $!;
while (<$infile>) {
$replace->();
print $outfile;
}
# (close the files)
}

Removing stop words and saving the new file Perl

I have created a Perl file to load in an array of "Stop words".
Then I load in a directory with ".ner" files contained in it.
Each file gets opened and each word is split and compared to the words in the stop file.
If the word matches the word it is changed to "" (nothing-and gets removed)
I then copy the file to another location. So I can differentiate between files with stop words and files without.
But does this change the file to now contain no stop words or will it revert back to the original?
#!/usr/bin/perl
#use strict;
#use warnings;
my #stops;
my #file;
use File::Copy;
open( STOPWORD, "/Users/jen/stopWordList.txt" ) or die "Can't Open: $!\n";
#stops = <STOPWORD>;
while (<STOPWORD>) #read each line into $_
{
chomp #stops; # Remove newline from $_
push #stops, $_; # add the line to #triggers
}
close STOPWORD;
$dirtoget="/Users/jen/temp/";
opendir(IMD, $dirtoget) || die("Cannot open directory");
#thefiles= readdir(IMD);
foreach $f (#thefiles){
if ($f =~ m/\.ner$/){
print $f,"\n";
open (FILE, "/Users/jen/temp/$f")or die"Cannot open FILE";
if ( FILE eq "" ) {
close FILE;
}
else{
while (<FILE>) {
foreach $word(split(/\|/)){
foreach $x (#stops) {
if ($x =~ m/\b\Q$word\E\b/) {
$word = '';
copy("/Users/jen/temp/$f","/Users/jen/correct/$f")or die "Copy failed: $!";
close FILE;
}
}
}
}
}
}
}
closedir(IMD);
exit 0;
The format of the file I am splitting and comparing is as follows:
'<title>|NN|O Woman|NNP|O jumped|VBD|O for|IN|O life|NN|O after|IN|O firebomb|NN|O attack|NN|O -|:|O National|NNP|I-ORG News|NNP|I-ORG ,|,|I-ORG Frontpage|NNP|I-ORG -|:|I-ORG Independent.ie</title>|NNP|'
Should I be outlining where the words should be split ie: split(/|/)?
You should ALWAYS use :
use strict;
use warnings;
use three args open and test opening for failure.
As said codaddict A split with no arguments is equivalent to split(' ', $_).
Here is a proposal to achieve the job (as far as I well understood what you wanted).
#!/usr/bin/perl
use strict;
use warnings;
use 5.10.1;
my #stops = qw(put here your stop words);
my %stops = map{$_ => 1} #stops;
my #thefiles;
my $path = '/Users/jen/temp/';
my $out = $path.'outputfile';
open my $fout, '>', $out or die "can't open '$out' for writing : $!";
foreach my $file(#thefiles) {
next unless $file =~ /\.ner$/;
open my $fh, '<', $path.$file or die "can't open '$file' for reading : $!";
my #lines = <$file>;
close $fh;
foreach my $line(#lines) {
my #words = split/\|/,$line;
foreach my $word(#words) {
$word = '' if exists $stops{$word};
}
print $fout join '|',#words;
}
}
close $out;
A split with no arguments is equivalent to split(' ', $_).
Since you want the lines to be split on | you need to do:
split/\|/
#jenniem001,
open FILE, ("<$fh")||die("cant");undef $/;my $whole_file = <FILE>;foreach my $word (#words){$whole_file=~s/\b\Q$word\E\b//ig;}open FILE (">>$duplicate")||die("cant");print FILE $whole_file;
That will remove stops from your file and create a duplicate. Just call give $duplicate a name :)