Search and replace multiple lines from a file - regex

I'm trying to remove a part of a.txt file and replace with contents of b.txt file while also doing modification to other lines in a.txt using a Perl program.
file a.txt
line1
line2
replace from below line
replace from this line
bla bla...
bla bla...
to this line
line3
line4
file b.txt
replacement1
replacement2
replacementn
Below is my code which is not working.
#!/apps/perl/5.8.3/bin/perl -w
open (INPUT, "a.txt") or die $!;
open (REPLACE, "b.txt") or die $!;
open (OUTPUT, ">c.txt") or die $!;
my $replace_text;
{
local $/;
$replace_text = <REPLACE>;
}
close(REPLACE);
while (<INPUT>) {
s/line1/modified_line1/;
s/line2/modified_line2/;
if($_ =~ /replace from below line/){
while(<INPUT>){
{
local undef $/;
s/replace from this line.*to this line/$replace_text/smg;
}
s/line3/modified_line3/;
s/line4/modified_line4/;
print OUTPUT;
}
}
}
close(INPUT);
close(OUTPUT);
Expected output file c.txt
modified_line1
modified_line2
replacement1
replacement2
replacementn
modified_line3
modified_line4
Can someone help me understand where I'm going wrong?

I don't think you need nested while loops to read your input file.
One way is to use a variable to control when you print to the output file:
use warnings;
use strict;
open (INPUT, "a.txt") or die $!;
open (REPLACE, "b.txt") or die $!;
open (OUTPUT, ">c.txt") or die $!;
my $replace_text;
{
local $/;
$replace_text = <REPLACE>;
}
close(REPLACE);
my $print = 1;
while (<INPUT>) {
s/line(\d)/modified_line$1/;
$print = 0 if /replace from below line/;
if (/to this line/) {
$print = 1;
$_ = $replace_text;
}
print OUTPUT if $print;
}
close(INPUT);
close(OUTPUT);
Output:
modified_line1
modified_line2
replacement1
replacement2
replacementn
modified_line3
modified_line4
I also consolidated your 4 line substitutions into 1 using \d.

As much as I like perl, it's really not necessary here:
sed -e 's/line1/modified_line1/' \
-e 's/line2/modified_line2/' \
-e 's/line3/modified_line3/' \
-e 's/line4/modified_line4/' \
-e '/replace from below/rb.txt' \
-e '/replace from below/,/to this line/d' a.txt
modified_line1
modified_line2
replacement1
replacement2
replacementn
modified_line3
modified_line4
If you did want to use perl, I'd just do:
#!/usr/bin/env perl
use strict;
use warnings;
open my $ah, '<', "a.txt" or die "a.txt: $!\n";
while(<$ah>) {
s/line1/modified_line1/;
s/line2/modified_line2/;
s/line3/modified_line3/;
s/line4/modified_line4/;
if( /replace from below/ ){
system "cat b.txt" and exit 1;
}
next if( /replace from below/ .. /to this line/);
print;
}

The problem description does not specify how big can be a.txt file. Posted code utilizes regular expression with modifier /smg what indicates that OP tries to work on multiline text.
Let's assume that input file is small enough to be read and processed in the memory.
For code manageability substitute placed into __DATA__ block which read in %substitute hash.
Build regular expression $re based on keys %substitute to utilize in substitution pattern.
Multiline substitution is based on original OP's code (is not applicable to line by line read of input data).
Two subroutines defined to read content of the file into variable and to store variable data into a file -- just to make the code easier to read and understand.
use strict;
use warnings;
use feature 'say';
my($fname_in,$fname_repl,$fname_out) = qw/a.txt b.txt c.txt/;
my %substitute = split(/[,\s]/, do{ local $/; <DATA>} );
my $re = '\b(' . join('|',keys %substitute) . ')\b';
my $data = read_file($fname_in);
my $replace_with = read_file($fname_repl);
$data =~ s/$re/$substitute{$1}/g;
$data =~ s/replace from below line.*?to this line/$replace_with/gsm;
save_file($fname_out,$data);
say $data;
exit 0;
sub read_file {
my $fname = shift;
my $data;
open my $fh, '<', $fname
or die "Couldn't open $fname";
$data = do { local $/; <$fh> };
close $fh;
return $data;
}
sub save_file {
my $fname = shift;
my $data = shift;
open my $fh, '>', $fname
or die "Couldn't open $fname";
say $fh $data;
close $fh;
}
__DATA__
line1,modified_line1
line2,modified_line2
line3,modified_line3
line4,modified_line4
Output
modified_line1
modified_line2
replacement1
replacement2
replacementn
modified_line3
modified_line4

Related

How to grep word from file

I want to grep some word inside a file from another file. My code is able to grep the word on last line of the file but not the word before it. I have no idea why and hope can get help here. Below is the perl script i using:
open(FILE1,"file1.txt") or die "Error, File1 could not open\n";
open(FILE2,"file2.txt") or die "Error, File2 could not open\n";
open(FILE3, ">file3.txt") or die "Error, File3 could not open\n";
use strict;
use warnings;
use List::MoreUtils qw(uniq);
my #file1=<FILE1>;
my #file2=<FILE2>;
my $j =0;
my $i =0;
my $zone =0;
for ($j=0; $j<=$#file2; $j++){
$zone = $file2[$j];
unless ( $zone =~ m/#(.*?)/ ) {
print "$zone";
my #fid = grep /$zone/ , #file1;
#fid = uniq(#fid);
s{^\s+|\s+$}{}g foreach #fid; #cancel leading space
for ($i=0; $i<=$#fid; $i++){
print FILE3 "$fid[$i]\n";
}
##fid=();
}
}
close(FILE3);
My file1.txt is something like this:
i am a dog
i am a cat
we are the fish
he is a boy
she is a girl
My file2.txt is like this:
is
am
But my file3 can only show those sentence contain am but no is, if i put is in second line and am in first line then my file3 only contain the sentences with is. I not very sure why my code can only grep the last row in my file2. Thanks for the help.
When reading from a file, the final newline is part of each line read. You can remove the newlines from the pattern array by chomping:
chomp( my #file2 = <FILE2> );
You can already do this with egrep :
egrep -f file2.txt file1.txt
The root of this problems is chomp - you're not removing linefeeds, so the matches aren't working.
But aside from that, there's a few problems with your code that could do with addressing:
opening files, you should use 3 arg open with lexical file handles, as it's better style: open (my $file1, '<', 'file1.txt' ) or die $!;
rather than a loop of loops, you'd probably be better off compiling up a 'match regex'.
Instead of reading all of a file into an array, you can iterate line by line, and you don't need to use the memory.
If you're iterating a loop, and only using the index to acccess the current element, you're far better off using foreach my $line ( #things ) { type syntax.
So your code actually could be reduced to something like:
#!/usr/bin/env perl
use strict;
use warnings;
open(my $data, '<',"file1.txt") or die $!;
open(my $search, '<', "file2.txt") or die $!;
open(my $output, '>', "file3.txt" ) or die $!;
chomp ( my #search_terms = <$search> );
#quotemeta is needed to avoid 'special' regex characters doing things.
my $search_regex = join "|", map { quotemeta }, #search_terms;
#note - '\b' denotes word boundary, which may not be what you want.
#means 'is' won't match 'fish'
#so get rid of them if that's not what you want.
$search_regex = qr/\b($search_regex)\b/;
print "Using: $search_regex\n";
select $output; #default print destination
while ( <$data> ) {
print if m/$search_regex/;
}
Output (in 'file3.txt'):
i am a dog
i am a cat
he is a boy
she is a girl
please try this .
use strict;
use warnings;
use List::MoreUtils qw(uniq);
open(FILE1,"file1.txt") or die "Error, File1 could not open\n";
open(FILE2,"file2.txt") or die "Error, File2 could not open\n";
open(FILE3, ">file3.txt") or die "Error, File3 could not open\n";
my #file1=<FILE1>;
my #file2=<FILE2>;
my $j =0;
my $i =0;
foreach my $main_line(#file1){
chomp($main_line);
foreach my $line(#file2){
chomp($line);
if ($main_line =~ /$line/i) {
print FILE3 "$main_line\n";
}
}
}
close(FILE3);
thanks,
praveenzx~

Perl Script – Look at the first number of each line and alert if 0 exists

Currently the script only looks at the first character of a txt file and emails if that value =0 using regex. I'm trying to update the script so it looks at each line until the end of the file and alert of any of the lines have the number 0. If all lines have 1 then do nothing. Any help would be greatly appreciated.
Example of an alert
1
1
1
0 -since there is a 0 an email alert would be generated
1
1
code below:
use warnings;
use strict;
my $file = '/users/tneal01/SPOOL/output.txt';
my $mark = 0;
my $cont = do {
open my $fh, '<', $file or die "Can't open $file -- $!";
local $/;
<$fh>;
};
# Pull the first number
my ($num) = $cont =~ /^(\d+)/;
if ($num == $mark)
{
my $body = "status $num has been recorded ";
my $cmd_email = "echo $body | " .
"mailx -s \"error occurring\" tneal01\#gmail.com";
system($cmd_email) == 0 or die "Error sending email -- $!";
}
I'd probably go with something like:
#!/usr/bin/env perl
use strict;
use warnings;
my $file = '/users/tneal01/SPOOL/output.txt';
my $mark = '0';
my $search = qr/^$mark\b/;
open my $fh, '<', $file or die "Can't open $file -- $!";
while (<$fh>) {
#line starts with 0. Or check other regex.
if (m/$search/) {
my $body = "status $mark has been recorded ";
my $cmd_email =
"echo $body | " . "mailx -s \"error occurring\" tneal01\#gmail.com";
system($cmd_email) == 0 or die "Error sending email -- $!";
#bail out the loop - assume you don't want more than one email per thing.
last;
}
}
close ( $fh );
This solution only reads one line at a time... there are simple solutions, but would need to load the entire file into memory...
I'm also assuming you want to know how many occurencies of $mark there are in the file.
#!/usr/bin/perl
use strict;
my $file = 'file.txt';
my $mark = '0';
open my $f, "<$file" or die "Error open file: $!\n";
my $counter=0;
while(my $line = <$f>) {
if($line =~ /$mark/) {
$counter++;
}
}
if($counter) {
my $body = "status $mark has been recorded $counter times";
my $cmd_email = "echo $body | mailx -s \"error occurring\" tneal01\#gmail.com";
system($cmd_email) == 0 or die "Error sending email -- $!";
}
This does a couple of things differently to other solutions here.
It reads from whatever filename is given on the command line
It stops checking once the first error is found
I don't know how useful those improvements are to you.
use warnings;
use strict;
my $mark = 0;
while (<>) {
my ($num) = /^(\d)/;
if ($num == $mark) {
my $body = "status $num has been recorded ";
my $cmd_email = "echo $body | " .
'mailx -s "error occurring" tneal01\#gmail.com';
system($cmd_email) == 0 or die "Error sending email -- $!";
last; # stop checking after the first error
}
}
(Oh, and I switched some double-quotes to single-quotes so you don't have to escape embedded double quotes.)
This version uses minimal changes to your existing script. I've commented the changes.
use warnings;
use strict;
my $file = '/users/tneal01/SPOOL/output.txt';
my $mark = 0;
my $cont = do {
open my $fh, '<', $file or die "Can't open $file -- $!";
local $/;
<$fh>;
};
# Pull the first number <-- comment not needed
# my ($num) = $cont =~ /^(\d+)/; # <-- delete this line
if ($cont =~ /^$mark/m) # <-- change the condition to this regex
{
my $body = "status $mark has been recorded "; # replace $num with $mark
my $cmd_email = "echo $body | " .
"mailx -s \"error occurring\" tneal01\#gmail.com";
system($cmd_email) == 0 or die "Error sending email -- $!";
}
What the change does is use a regular expression to check if any lines begin with the value defined in $mark.
I deleted the $num variable, but its contents was the same as $mark so we can just use $mark instead.
A breakdown of $cont =~ /^$mark/m:
$cont =~ Apply the following regex to the string contained in $cont
/ Start the regular expression
^ Match the beginning of a line
$mark Match the string specified in the $mark variable
/ End regular expression
m Flag to tell the regex to treat $cont as a multiple-line string (which it is)

Multiple pattern match and replace

How to extract patterns from a file and replace the multiple patterns with a new pattern from a file?
For example:
Lets say the pattern file is pattern.txt, as follows with 2,000 lines.
a
b
d
e
f
....
...
...
File to replace pattens is replace.txt containing:
a,1
b,3
c,5
d,10
e,14
....
...
...
The intended final file content for file patterns.txt is:
a,1
b,3
d,10
e,14
....
...
...
Perl from command line,
perl -i -pe'
BEGIN{ local (#ARGV, $/, $^I) =pop; %h = split /[\s,]+/, <> }
s| (\S+)\K |,$h{$1}|x
' pattern.txt replace.txt
It slurps content of second file ($/ to undef), and temporarily disables in-place editing ($^I to undef), splits string on white-spaces/commas and populate %h hash in key/value manner. Then for every line of first file adds comma and value for current key.
With the possibility of arbitrary characters in your input, it might be safest to use Text::CSV. The benefit is that it will handle things like quoted delimiters, multiline strings, etc. The drawback is that it can break on non-csv content, so it sort of relies on your input being proper csv.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
eol => $/,
});
my %s;
my ($input, $replace) = #ARGV;
open my $fh, "<", $replace or die "Cannot open $replace: $!";
while (my $row = $csv->getline($fh)) {
my ($key, $line) = #$row;
$s{$key} = $line;
}
open $fh, "<", $input or die "Cannot open $input: $!";
while (<$fh>) {
chomp;
$csv->print(*STDOUT, [$_, $s{$_}]);
}
Not sure this really needs a regex as you're not really altering your source, as much as 'just' printing based on key fields.
So I would approach it something like this:
#!/usr/bin/env perl
use strict;
use warnings;
open( my $replace, "<", "replace.txt" ) or die $!;
my %replacements;
while (<$replace>) {
chomp;
my ( $key, $value ) = split(/,/);
$replacements{$key} = $value;
}
close($replace);
open( my $input, "<", "input.txt" ) or die $!;
open( my $output, ">", "patterns.txt" ) or die $!;
while ( my $line = <$input> ) {
chomp $line;
if ( $replacements{$line} ) {
print {$output} $replacements{$line}, "\n";
}
}
close($input);
close($output);
It's not as concise as some of the other examples, but hopefully clearer what it's actually doing. This I call a good thing. (I can make it much more compact, in the way that perl is (in)famous for).

perl script for reading data with in angular brackets < and > not working

My task is to read data within angule brackets and write to another file. I have developed a script but it is not working. If there are any modifications required then please correct me.
The sample data which I want to read is like this
Textbooks written by author1 `{ <sam>,<january>,<2015>},{<rga>,<feb>,<2005>},`
This is my Perl program
#!usr/bin/local/perl
use warnings;
use strict;
my #c_result;
my #c_result_array;
my %hash;
my $file = 'c_template.c';
open CFILE, $file or die "Could not open $file: $!";
my #content;
my $fileoutput="output_c.c";
open OUTFILE,"> $fileoutput" or die $!;
my $i;
while(<CFILE>)
{
for $i (#content)
{
$_ = $i;
if(/<[\w*_*]+>/)
{
#c_result = /<[\w*_*]+>/g;
for my $i (#c_result)
{
my $key=substr($i,1,length($i)-2);
$i=$key;
push #c_result_array,$i;
print OUTFILE $i ."=>#c_result_array";
print OUTFILE "\n";
}
}
}
}
close OUTFILE;
close CFILE;
For starters, you never populate #content but you attempt to iterate it.
In your regular expressions: [] are used to define character classes, () are used to define capture groups.
Your question is very unclear and imprecise, but from your code I think this is what you want
#!/usr/bin/local/perl
use strict;
use warnings;
my ($file, $fileoutput) = qw/ c_template.c output_c.c /;
open my $c_fh, '<', $file or die qq{Could not open "$file" for input: $!};
open my $out_fh, '>', $fileoutput or die qq{Could not open "$fileoutput" for output: $!};
select $out_fh;
while ( <$c_fh> ) {
next unless my #fields = /<([^<>]+)>/g;
chomp;
print "$_ => #fields\n";
}
close $out_fh or die qq{Could not close "$fileoutput": $!};
output
Textbooks written by author1 `{ <sam>,<january>,<2015>},{<rga>,<feb>,<2005>},` => sam january 2015 rga feb 2005

How can I extract and save text using Perl?

No extracted data output to data2.txt? What goes wrong to the code?
MyFile.txt
ex1,fx2,xx1
mm1,nn2,gg3
EX1,hh2,ff7
This is my desired output in data2.txt:
ex1,fx2,xx1
EX1,hh2,ff7
#! /DATA/PLUG/pvelasco/Softwares/PERLINUX/bin/perl -w
my $infile ='My1.txt';
my $outfile ='data2.txt';
open IN, '<', $infile or die "Cant open $infile:$!";
open OUT, '>', $outfile or die "Cant open $outfile:$!";
while (<IN>) {
if (m/EX$HF|ex$HF/) {
print OUT $_, "\n";
print $_;
}
}
close IN;
close OUT;
This regex makes no sense:
m/EX$HF|ex$HF/
Is $HF supposed to be a variable? What are you trying to match?
Also, the second line in every Perl script you write should be:
use strict;
It will make Perl catch such mistakes and tell you about them, rather than silently ignoring them.
while (<IN>) {
if (m/^(EX|ex)\d.*/) {
print OUT "$_";
print $_;
}
}
Sorry if this seems like stating the bleeding obvious, but what's wrong with
grep -i ^ex < My1.txt > data2.txt
... or if you really want to do it in perl (and there's nothing wrong with that):
perl -ne '/^ex/i && print' < My1.txt > data2.txt
This assumes the purpose of the request is to find lines that start with EX, with case-insensitivity.
When I run your code, but name the input file My1.txt instead of MyFile.txt I get the desired output - except with empty lines, which you can remove by removing the , "\n" from the print statement.
The filenames don't match.
open(my $inhandle, '<', $infile) or die "Cant open $infile: $!";
open(my $outhandle, '>', $outfile) or die "Cant open $outfile: $!";
while(my $line = <$inhandle>) {
# Assumes that ex, Ex, eX, EX all are valid first characters
if($line =~ m{^ex}i) { # or if(lc(substr $line, 0 => 2) eq 'ex') {
print { $outhandle } $line;
print $line;
}
}
And yes, always always use strict;
You could also chomp $line and (if using perl 5.10) say $line instead of print "$line\n".