How can I extract and save text using Perl? - regex

No extracted data output to data2.txt? What goes wrong to the code?
MyFile.txt
ex1,fx2,xx1
mm1,nn2,gg3
EX1,hh2,ff7
This is my desired output in data2.txt:
ex1,fx2,xx1
EX1,hh2,ff7
#! /DATA/PLUG/pvelasco/Softwares/PERLINUX/bin/perl -w
my $infile ='My1.txt';
my $outfile ='data2.txt';
open IN, '<', $infile or die "Cant open $infile:$!";
open OUT, '>', $outfile or die "Cant open $outfile:$!";
while (<IN>) {
if (m/EX$HF|ex$HF/) {
print OUT $_, "\n";
print $_;
}
}
close IN;
close OUT;

This regex makes no sense:
m/EX$HF|ex$HF/
Is $HF supposed to be a variable? What are you trying to match?
Also, the second line in every Perl script you write should be:
use strict;
It will make Perl catch such mistakes and tell you about them, rather than silently ignoring them.

while (<IN>) {
if (m/^(EX|ex)\d.*/) {
print OUT "$_";
print $_;
}
}

Sorry if this seems like stating the bleeding obvious, but what's wrong with
grep -i ^ex < My1.txt > data2.txt
... or if you really want to do it in perl (and there's nothing wrong with that):
perl -ne '/^ex/i && print' < My1.txt > data2.txt
This assumes the purpose of the request is to find lines that start with EX, with case-insensitivity.

When I run your code, but name the input file My1.txt instead of MyFile.txt I get the desired output - except with empty lines, which you can remove by removing the , "\n" from the print statement.

The filenames don't match.
open(my $inhandle, '<', $infile) or die "Cant open $infile: $!";
open(my $outhandle, '>', $outfile) or die "Cant open $outfile: $!";
while(my $line = <$inhandle>) {
# Assumes that ex, Ex, eX, EX all are valid first characters
if($line =~ m{^ex}i) { # or if(lc(substr $line, 0 => 2) eq 'ex') {
print { $outhandle } $line;
print $line;
}
}
And yes, always always use strict;
You could also chomp $line and (if using perl 5.10) say $line instead of print "$line\n".

Related

Search and replace multiple lines from a file

I'm trying to remove a part of a.txt file and replace with contents of b.txt file while also doing modification to other lines in a.txt using a Perl program.
file a.txt
line1
line2
replace from below line
replace from this line
bla bla...
bla bla...
to this line
line3
line4
file b.txt
replacement1
replacement2
replacementn
Below is my code which is not working.
#!/apps/perl/5.8.3/bin/perl -w
open (INPUT, "a.txt") or die $!;
open (REPLACE, "b.txt") or die $!;
open (OUTPUT, ">c.txt") or die $!;
my $replace_text;
{
local $/;
$replace_text = <REPLACE>;
}
close(REPLACE);
while (<INPUT>) {
s/line1/modified_line1/;
s/line2/modified_line2/;
if($_ =~ /replace from below line/){
while(<INPUT>){
{
local undef $/;
s/replace from this line.*to this line/$replace_text/smg;
}
s/line3/modified_line3/;
s/line4/modified_line4/;
print OUTPUT;
}
}
}
close(INPUT);
close(OUTPUT);
Expected output file c.txt
modified_line1
modified_line2
replacement1
replacement2
replacementn
modified_line3
modified_line4
Can someone help me understand where I'm going wrong?
I don't think you need nested while loops to read your input file.
One way is to use a variable to control when you print to the output file:
use warnings;
use strict;
open (INPUT, "a.txt") or die $!;
open (REPLACE, "b.txt") or die $!;
open (OUTPUT, ">c.txt") or die $!;
my $replace_text;
{
local $/;
$replace_text = <REPLACE>;
}
close(REPLACE);
my $print = 1;
while (<INPUT>) {
s/line(\d)/modified_line$1/;
$print = 0 if /replace from below line/;
if (/to this line/) {
$print = 1;
$_ = $replace_text;
}
print OUTPUT if $print;
}
close(INPUT);
close(OUTPUT);
Output:
modified_line1
modified_line2
replacement1
replacement2
replacementn
modified_line3
modified_line4
I also consolidated your 4 line substitutions into 1 using \d.
As much as I like perl, it's really not necessary here:
sed -e 's/line1/modified_line1/' \
-e 's/line2/modified_line2/' \
-e 's/line3/modified_line3/' \
-e 's/line4/modified_line4/' \
-e '/replace from below/rb.txt' \
-e '/replace from below/,/to this line/d' a.txt
modified_line1
modified_line2
replacement1
replacement2
replacementn
modified_line3
modified_line4
If you did want to use perl, I'd just do:
#!/usr/bin/env perl
use strict;
use warnings;
open my $ah, '<', "a.txt" or die "a.txt: $!\n";
while(<$ah>) {
s/line1/modified_line1/;
s/line2/modified_line2/;
s/line3/modified_line3/;
s/line4/modified_line4/;
if( /replace from below/ ){
system "cat b.txt" and exit 1;
}
next if( /replace from below/ .. /to this line/);
print;
}
The problem description does not specify how big can be a.txt file. Posted code utilizes regular expression with modifier /smg what indicates that OP tries to work on multiline text.
Let's assume that input file is small enough to be read and processed in the memory.
For code manageability substitute placed into __DATA__ block which read in %substitute hash.
Build regular expression $re based on keys %substitute to utilize in substitution pattern.
Multiline substitution is based on original OP's code (is not applicable to line by line read of input data).
Two subroutines defined to read content of the file into variable and to store variable data into a file -- just to make the code easier to read and understand.
use strict;
use warnings;
use feature 'say';
my($fname_in,$fname_repl,$fname_out) = qw/a.txt b.txt c.txt/;
my %substitute = split(/[,\s]/, do{ local $/; <DATA>} );
my $re = '\b(' . join('|',keys %substitute) . ')\b';
my $data = read_file($fname_in);
my $replace_with = read_file($fname_repl);
$data =~ s/$re/$substitute{$1}/g;
$data =~ s/replace from below line.*?to this line/$replace_with/gsm;
save_file($fname_out,$data);
say $data;
exit 0;
sub read_file {
my $fname = shift;
my $data;
open my $fh, '<', $fname
or die "Couldn't open $fname";
$data = do { local $/; <$fh> };
close $fh;
return $data;
}
sub save_file {
my $fname = shift;
my $data = shift;
open my $fh, '>', $fname
or die "Couldn't open $fname";
say $fh $data;
close $fh;
}
__DATA__
line1,modified_line1
line2,modified_line2
line3,modified_line3
line4,modified_line4
Output
modified_line1
modified_line2
replacement1
replacement2
replacementn
modified_line3
modified_line4

How to grep word from file

I want to grep some word inside a file from another file. My code is able to grep the word on last line of the file but not the word before it. I have no idea why and hope can get help here. Below is the perl script i using:
open(FILE1,"file1.txt") or die "Error, File1 could not open\n";
open(FILE2,"file2.txt") or die "Error, File2 could not open\n";
open(FILE3, ">file3.txt") or die "Error, File3 could not open\n";
use strict;
use warnings;
use List::MoreUtils qw(uniq);
my #file1=<FILE1>;
my #file2=<FILE2>;
my $j =0;
my $i =0;
my $zone =0;
for ($j=0; $j<=$#file2; $j++){
$zone = $file2[$j];
unless ( $zone =~ m/#(.*?)/ ) {
print "$zone";
my #fid = grep /$zone/ , #file1;
#fid = uniq(#fid);
s{^\s+|\s+$}{}g foreach #fid; #cancel leading space
for ($i=0; $i<=$#fid; $i++){
print FILE3 "$fid[$i]\n";
}
##fid=();
}
}
close(FILE3);
My file1.txt is something like this:
i am a dog
i am a cat
we are the fish
he is a boy
she is a girl
My file2.txt is like this:
is
am
But my file3 can only show those sentence contain am but no is, if i put is in second line and am in first line then my file3 only contain the sentences with is. I not very sure why my code can only grep the last row in my file2. Thanks for the help.
When reading from a file, the final newline is part of each line read. You can remove the newlines from the pattern array by chomping:
chomp( my #file2 = <FILE2> );
You can already do this with egrep :
egrep -f file2.txt file1.txt
The root of this problems is chomp - you're not removing linefeeds, so the matches aren't working.
But aside from that, there's a few problems with your code that could do with addressing:
opening files, you should use 3 arg open with lexical file handles, as it's better style: open (my $file1, '<', 'file1.txt' ) or die $!;
rather than a loop of loops, you'd probably be better off compiling up a 'match regex'.
Instead of reading all of a file into an array, you can iterate line by line, and you don't need to use the memory.
If you're iterating a loop, and only using the index to acccess the current element, you're far better off using foreach my $line ( #things ) { type syntax.
So your code actually could be reduced to something like:
#!/usr/bin/env perl
use strict;
use warnings;
open(my $data, '<',"file1.txt") or die $!;
open(my $search, '<', "file2.txt") or die $!;
open(my $output, '>', "file3.txt" ) or die $!;
chomp ( my #search_terms = <$search> );
#quotemeta is needed to avoid 'special' regex characters doing things.
my $search_regex = join "|", map { quotemeta }, #search_terms;
#note - '\b' denotes word boundary, which may not be what you want.
#means 'is' won't match 'fish'
#so get rid of them if that's not what you want.
$search_regex = qr/\b($search_regex)\b/;
print "Using: $search_regex\n";
select $output; #default print destination
while ( <$data> ) {
print if m/$search_regex/;
}
Output (in 'file3.txt'):
i am a dog
i am a cat
he is a boy
she is a girl
please try this .
use strict;
use warnings;
use List::MoreUtils qw(uniq);
open(FILE1,"file1.txt") or die "Error, File1 could not open\n";
open(FILE2,"file2.txt") or die "Error, File2 could not open\n";
open(FILE3, ">file3.txt") or die "Error, File3 could not open\n";
my #file1=<FILE1>;
my #file2=<FILE2>;
my $j =0;
my $i =0;
foreach my $main_line(#file1){
chomp($main_line);
foreach my $line(#file2){
chomp($line);
if ($main_line =~ /$line/i) {
print FILE3 "$main_line\n";
}
}
}
close(FILE3);
thanks,
praveenzx~

perl script for reading data with in angular brackets < and > not working

My task is to read data within angule brackets and write to another file. I have developed a script but it is not working. If there are any modifications required then please correct me.
The sample data which I want to read is like this
Textbooks written by author1 `{ <sam>,<january>,<2015>},{<rga>,<feb>,<2005>},`
This is my Perl program
#!usr/bin/local/perl
use warnings;
use strict;
my #c_result;
my #c_result_array;
my %hash;
my $file = 'c_template.c';
open CFILE, $file or die "Could not open $file: $!";
my #content;
my $fileoutput="output_c.c";
open OUTFILE,"> $fileoutput" or die $!;
my $i;
while(<CFILE>)
{
for $i (#content)
{
$_ = $i;
if(/<[\w*_*]+>/)
{
#c_result = /<[\w*_*]+>/g;
for my $i (#c_result)
{
my $key=substr($i,1,length($i)-2);
$i=$key;
push #c_result_array,$i;
print OUTFILE $i ."=>#c_result_array";
print OUTFILE "\n";
}
}
}
}
close OUTFILE;
close CFILE;
For starters, you never populate #content but you attempt to iterate it.
In your regular expressions: [] are used to define character classes, () are used to define capture groups.
Your question is very unclear and imprecise, but from your code I think this is what you want
#!/usr/bin/local/perl
use strict;
use warnings;
my ($file, $fileoutput) = qw/ c_template.c output_c.c /;
open my $c_fh, '<', $file or die qq{Could not open "$file" for input: $!};
open my $out_fh, '>', $fileoutput or die qq{Could not open "$fileoutput" for output: $!};
select $out_fh;
while ( <$c_fh> ) {
next unless my #fields = /<([^<>]+)>/g;
chomp;
print "$_ => #fields\n";
}
close $out_fh or die qq{Could not close "$fileoutput": $!};
output
Textbooks written by author1 `{ <sam>,<january>,<2015>},{<rga>,<feb>,<2005>},` => sam january 2015 rga feb 2005

Replace every nth Occurrence in a Perl Script

I have text file with 10 lines. Each line has the word no_access and only that in it. I found a website that has syntax to replace every nth occurrence of some string. When I try to put it into a script, it spits out errors.
Replace every Nth occurrence
This is the script I have so far:
use strict;
use warnings;
while (<>) {
my $count = 0;
s/no_access/(++$count % 3 == 0)?"read":$&/ge;
}
print;
However, I get the error: Use of uninitialized value $_ in print.
I tried the script without the print command, but nothing happens. How do I get this script to run and perform the replacement of every third "no_access".
Here's another option:
use strict;
use warnings;
my $i = 0;
my $n = 3;
while (<>) {
s/no_access/read/ if !( ++$i % $n );
print;
}
Usage: perl script.pl inFile [>outFile]
The last, optional parameter directs output to a file.
Hope this helps!
Your code is almost correct, just move the variable declaration outside the while loop, and the print inside:
use strict;
use warnings;
my $count = 0;
while (<>) {
s/no_access/(++$count % 3 == 0)?"read":$&/ge;
print;
}
It then reads from stdin and prints to stdout, changing every 3rd occurrence as you want it to.
If you want to read in a file, change its contents, then write it again, your code could look like this:
use strict;
use warnings;
my $file = $ARGV[0];
die "usage: $0 <filename>" unless defined $file;
open(IN, "<$file") or die "Can't read $file: $!";
my $count = 0;
my $out = "";
while (<IN>) {
s/no_access/(++$count % 3 == 0)?"read":$&/ge;
$out .= $_;
}
close(IN);
open(OUT, ">$file") or die "Can't write $file: $!";
print OUT $out;
close(OUT);

How to match exactly two empty lines

I have a question about regular expressions. I have a file and I need to parse it in such a way that I could distinguish some specific blocks of text in it. These blocks of text are separated by two empty lines (there are blocks which are separated by 3 or 1 empty lines but I need exactly 2). So I have a piece of code and this is \s*$^\s*$/ regular expression I think should match, but it does not.
What is wrong?
$filename="yu";
open($in,$filename);
open(OUT,">>out.text");
while($str=<$in>)
{
unless($str = /^\s*$^\s*$/){
print "yes";
print OUT $str;
}
}
close($in);
close(OUT);
Cheers,
Yuliya
By default, Perl reads files a line at a time, so you won't see multiple new lines. The following code selects text terminated by a double new line.
local $/ = "\n\n" ;
while (<> ) {
print "-- found $_" ;
}
New Answer
After having problems excluding >2 empty lines, and a good nights sleep here is a better method that doesn't even need to slurp.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my #blocks; #each element will be an arrayref, one per block
#that referenced array will hold lines in that block
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 0;
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
$block_num++; # move on to next block
$empty = 0;
} else {
$empty = 0;
}
push #{ $blocks[$block_num] }, $line;
}
#write out each block to a new file
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out join("\n", #$block);
}
In fact rather than store and write later, you could simply write to one file per block as you go:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
close(OUT); #just learned this line isn't necessary, perldoc -f close
open(OUT, '>', ++$block_num . '.txt');
$empty = 0;
} else {
$empty = 0;
}
print OUT "$line\n";
}
close(OUT);
use 5.012;
open my $fh,'<','1.txt';
#slurping file
local $/;
my $content = <$fh>;
close $fh;
for my $block ( split /(?<!\n)\n\n\n(?!\n)/,$content ) {
say 'found:';
say $block;
}
Deprecated in favor of new answer
justintime's answer works by telling perl that you want to call the end of a line "\n\n", which is clever and will work well. One exception is that this must match exactly. By the regex you are using it makes it seem like there might be whitespace on the "empty" lines, in which case this will not work. Also his method will split even on more than 2 linebreaks, which was not allowed in the OP.
For completeness, to do it the way you were asking, you need to slurp the whole file into a variable (if the file is not so large as to use all your memory, probably fine in most cases).
I would then probably say to use the split function to split the block of text into an array of chunks. Your code would then look something like:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my $text;
open(my $fh, '<', $file);
{
local $/; enables slurp mode inside this block
$text = <$fh>;
}
close($fh);
my #blocks = split(
/
(?<!\n)\n #check to make sure there isn't another \n behind this one
\s*\n #first whitespace only line
\s*\n #second "
(?!\n) #check to make sure there isn't another \n after this one
/x, # x flag allows comments and whitespace in regex
$text
);
You can then do operations on the array. If I understand your comment to justintime's answer, you want to write each block out to a different file. That would look something like
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}
Notice that since you open $out lexically (with my) when it reaches the end of the foreach block, the $out variable dies (i.e. "goes out of scope"). When this happens to a lexical filehandle, the file is automatically closed. And you can do a similar thing to that with justintime's method as well:
local $/ = "\n\n" ;
my $file_num = 1;
while (<>) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}