Replace every nth Occurrence in a Perl Script - regex

I have text file with 10 lines. Each line has the word no_access and only that in it. I found a website that has syntax to replace every nth occurrence of some string. When I try to put it into a script, it spits out errors.
Replace every Nth occurrence
This is the script I have so far:
use strict;
use warnings;
while (<>) {
my $count = 0;
s/no_access/(++$count % 3 == 0)?"read":$&/ge;
}
print;
However, I get the error: Use of uninitialized value $_ in print.
I tried the script without the print command, but nothing happens. How do I get this script to run and perform the replacement of every third "no_access".

Here's another option:
use strict;
use warnings;
my $i = 0;
my $n = 3;
while (<>) {
s/no_access/read/ if !( ++$i % $n );
print;
}
Usage: perl script.pl inFile [>outFile]
The last, optional parameter directs output to a file.
Hope this helps!

Your code is almost correct, just move the variable declaration outside the while loop, and the print inside:
use strict;
use warnings;
my $count = 0;
while (<>) {
s/no_access/(++$count % 3 == 0)?"read":$&/ge;
print;
}
It then reads from stdin and prints to stdout, changing every 3rd occurrence as you want it to.
If you want to read in a file, change its contents, then write it again, your code could look like this:
use strict;
use warnings;
my $file = $ARGV[0];
die "usage: $0 <filename>" unless defined $file;
open(IN, "<$file") or die "Can't read $file: $!";
my $count = 0;
my $out = "";
while (<IN>) {
s/no_access/(++$count % 3 == 0)?"read":$&/ge;
$out .= $_;
}
close(IN);
open(OUT, ">$file") or die "Can't write $file: $!";
print OUT $out;
close(OUT);

Related

How do I store culmultive length of strings into an array with Perl

I have a file which has a series of lines that are made up of A's, C's, G's and T's. I want to find the length of those lines, make a list of the culmultive lengths (adding the lengths together sequentially), and putting that into an array. So far I have:
#! /usr/bin/perl -w
use strict;
my $input = $ARGV[0];
my %idSeq;
my (#ID, #Seq);
open (my $INPUT, "<$input") or die "unable to open $input";
while (<$INPUT>) {
if (my $culm_length = /^([AGCT]\w+)\n$/) {
length($culm_length) = $_;
push (#Seq, $1);
}
}
bla bla bla....
So far I think what I have written gives me an array of the length of individual lines. I want the culmultive lengths, any ideas?
With reference to your previous question How do I read strings into a hash in Perl which was put on hold, I think perhaps you want a running total of the lengths of the lines
I would write it this way. It keeps the running total in $total and pushes its value onto the #lengths array every time it changes
use strict;
use warnings 'all';
my ( $input ) = #ARGV;
open my $fh, '<', $input or die qq{unable to open "$input" for input: $!};
my #lengths;
my $total = 0;
while ( <$fh> ) {
push #lengths, $total += length $1 if /^([ACGT]+)/;
}
#!/usr/bin/perl -w
use strict;
my $length = 0;
while (<>) {
$length += length($1) if /^([AGCT]\w+)$/;
}
my #length = $length; # But why would you want to do this?!
...

Search for a pattern in a file

I have a file eg.txt with contents of this sort :
....text...
....text...
COMP1 = ../../path1/path2/path3
COMP2 = ../../path4/path5/path6
and so on, for a large number of application names (the "COMP"s). I need to get the path -- the stuff including and after the second slash -- for a user-specified application.
This is the code I've been trying :
use strict;
use warnings;
my $line = "";
my $app = "";
print "Enter the app";
$app = <STDIN>;
print $app;
open my $fh, '<', "eg.txt" or die "Cannot open $!";
while (<$fh>) {
$line = <$fh>;
if ( $line && $line =~ /($app)( = )(..\/)(..)(.*)/ ) {
print $5;
}
}
This prints the name of the user-input application, and does nothing else. Any help would be greatly appreciated!
There are two main problems with your program
The $app variable contains a newline at the end from the enter key you pressed when you typed it in. That will prevent the pattern from matching so you need to use chomp to remove it. The same applies to lines read from your file
The <$fh> in your while statement reads a line from your file into the default variable $_, and then $line = <$fh> reads another, so you are ignoring alternate lines from the file
Here is a version of your program that I think should work although I am unable to test it at present. I have dropped your $line variable altogether and hope that doesn't confuse you. $_ is the default variable for the pattern match so it isn't mentioned explicitly anywhere
use strict;
use warnings;
print "Enter the app: ";
my $app = <STDIN>;
chomp $app;
open my $fh, '<', 'eg.txt' or die "Cannot open: $!";
while ( <$fh> ) {
if ( /$app\s*=\s*(.+)/ ) {
my $path = $1;
$path =~ s/.*\.\.//;
print $path, "\n";
}
}
The input did not matched in regex because newlines were coming along with them, so better use chomp to trim them. In while loop you are displacing two times the file handle, I don't know why. So after corrections this should work:
use strict;
use warnings;
my $line = "";
my $app = "";
print "Enter the app";
chomp($app = <STDIN>);
print "$app: ";
open my $fh, '<', "eg.txt" or die "Cannot open $!";
while($line = <$fh>)
{
chomp $line;
if($line && $line =~ /($app)( = )(..\/)(..)(.*)/)
{
print "$5 \n";
}
}
close($fh);
Try this code:
use strict;
use warnings;
my $line = "";
my $app = "";
print "Enter the app";
$app = <STDIN>;
print $app;
open my $fh, '<', "eg.txt" or die "Cannot open $!";
my #line = <$fh>;
my #fetch = map { /COMP\d+\s\=\s(\..\/\..\/.*)/g } #line ;
$, = "\n";
print #fetch;
and then please send your response.
You are accessing <$fh> twice in your loop. This will have the effect of interpreting only every other line. You might want to change the top of the loop to something like this:
while (defined(my $line = <$fh>)) {
and remove the my $line ... at the top of the program.
Also, you might want to consider chomping your input line so that you don't have to think about the trailing newline character:
while (defined(my $line = <$fh>)) {
chomp $line;
Your regular expression is also a bit dicey. You probably want to bind it to the beginning and end of the search space and escape the literal dots. You may also want $app to be interpreted as a string rather than a regexp, which can be done by wrapping it with \Q...\E. Also unless your file format specifies single spaces around the equals, I'd be tempted to make those flexible to zero or more occurrences. Also, if you aren't going to use the earlier captures, I would say don't do them, so:
if ($line && $line =~ /^\Q$app\E *= *\.\.\/\.\.(.*)$/)
{
print $1;
(Some may say you should use \A and \z rather than ^ and $. That choice is left as an exercise to the reader.)

Recurrent replacement using Perl

I have a word (MODEL 1) in my file 20 times interspersed by lines of text. I want to replace it with the frequency number of occurrence e.g. MODEL 1 and then when it occurs again then MODEL 2 and then MODEL 3 and so on so forth.
However my loop gets stuck at the first round and not looping till it has replaced all of words.
Can any one tell me what I have been missing out. Any help would be much appreciated.
The code is listed below:
#!/usr/bin/perl -w
my $file = 'test.text';
open (my $fh, $file);
while (my $row = <$fh>) {
chomp $row;
if (($row) =~ /^MODEL 1/){
$i = 1;
$row =~ s/^MODEL 1/MODEL $i/g;
$i++;
}
print "$row\n";
}
You need to move your counter variable outside of the loop.
As a simplification, use s///e to match and replace in a single step:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my $file = 'test.text';
open my $fh, '<', $file;
my $counter = 0;
while (<$fh>) {
chomp;
s/^MODEL \K1/++$counter/e;
print "$_\n";
}
Move $i = 1 initialization to above while loop
$i = 1;
while (my $row = <$fh>) { chomp $row;
...
}
You're resetting it back to for every line, so there won't be any change at all.
You can increment a counter in the replacement pattern itself with the e modifier:
my $i=1;
while (my $row = <$fh>) {
chomp $row;
$row =~ s/MODEL \K1/$i++/ge
print "$row\n";
}

How to match exactly two empty lines

I have a question about regular expressions. I have a file and I need to parse it in such a way that I could distinguish some specific blocks of text in it. These blocks of text are separated by two empty lines (there are blocks which are separated by 3 or 1 empty lines but I need exactly 2). So I have a piece of code and this is \s*$^\s*$/ regular expression I think should match, but it does not.
What is wrong?
$filename="yu";
open($in,$filename);
open(OUT,">>out.text");
while($str=<$in>)
{
unless($str = /^\s*$^\s*$/){
print "yes";
print OUT $str;
}
}
close($in);
close(OUT);
Cheers,
Yuliya
By default, Perl reads files a line at a time, so you won't see multiple new lines. The following code selects text terminated by a double new line.
local $/ = "\n\n" ;
while (<> ) {
print "-- found $_" ;
}
New Answer
After having problems excluding >2 empty lines, and a good nights sleep here is a better method that doesn't even need to slurp.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my #blocks; #each element will be an arrayref, one per block
#that referenced array will hold lines in that block
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 0;
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
$block_num++; # move on to next block
$empty = 0;
} else {
$empty = 0;
}
push #{ $blocks[$block_num] }, $line;
}
#write out each block to a new file
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out join("\n", #$block);
}
In fact rather than store and write later, you could simply write to one file per block as you go:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
close(OUT); #just learned this line isn't necessary, perldoc -f close
open(OUT, '>', ++$block_num . '.txt');
$empty = 0;
} else {
$empty = 0;
}
print OUT "$line\n";
}
close(OUT);
use 5.012;
open my $fh,'<','1.txt';
#slurping file
local $/;
my $content = <$fh>;
close $fh;
for my $block ( split /(?<!\n)\n\n\n(?!\n)/,$content ) {
say 'found:';
say $block;
}
Deprecated in favor of new answer
justintime's answer works by telling perl that you want to call the end of a line "\n\n", which is clever and will work well. One exception is that this must match exactly. By the regex you are using it makes it seem like there might be whitespace on the "empty" lines, in which case this will not work. Also his method will split even on more than 2 linebreaks, which was not allowed in the OP.
For completeness, to do it the way you were asking, you need to slurp the whole file into a variable (if the file is not so large as to use all your memory, probably fine in most cases).
I would then probably say to use the split function to split the block of text into an array of chunks. Your code would then look something like:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my $text;
open(my $fh, '<', $file);
{
local $/; enables slurp mode inside this block
$text = <$fh>;
}
close($fh);
my #blocks = split(
/
(?<!\n)\n #check to make sure there isn't another \n behind this one
\s*\n #first whitespace only line
\s*\n #second "
(?!\n) #check to make sure there isn't another \n after this one
/x, # x flag allows comments and whitespace in regex
$text
);
You can then do operations on the array. If I understand your comment to justintime's answer, you want to write each block out to a different file. That would look something like
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}
Notice that since you open $out lexically (with my) when it reaches the end of the foreach block, the $out variable dies (i.e. "goes out of scope"). When this happens to a lexical filehandle, the file is automatically closed. And you can do a similar thing to that with justintime's method as well:
local $/ = "\n\n" ;
my $file_num = 1;
while (<>) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}

How can I extract and save text using Perl?

No extracted data output to data2.txt? What goes wrong to the code?
MyFile.txt
ex1,fx2,xx1
mm1,nn2,gg3
EX1,hh2,ff7
This is my desired output in data2.txt:
ex1,fx2,xx1
EX1,hh2,ff7
#! /DATA/PLUG/pvelasco/Softwares/PERLINUX/bin/perl -w
my $infile ='My1.txt';
my $outfile ='data2.txt';
open IN, '<', $infile or die "Cant open $infile:$!";
open OUT, '>', $outfile or die "Cant open $outfile:$!";
while (<IN>) {
if (m/EX$HF|ex$HF/) {
print OUT $_, "\n";
print $_;
}
}
close IN;
close OUT;
This regex makes no sense:
m/EX$HF|ex$HF/
Is $HF supposed to be a variable? What are you trying to match?
Also, the second line in every Perl script you write should be:
use strict;
It will make Perl catch such mistakes and tell you about them, rather than silently ignoring them.
while (<IN>) {
if (m/^(EX|ex)\d.*/) {
print OUT "$_";
print $_;
}
}
Sorry if this seems like stating the bleeding obvious, but what's wrong with
grep -i ^ex < My1.txt > data2.txt
... or if you really want to do it in perl (and there's nothing wrong with that):
perl -ne '/^ex/i && print' < My1.txt > data2.txt
This assumes the purpose of the request is to find lines that start with EX, with case-insensitivity.
When I run your code, but name the input file My1.txt instead of MyFile.txt I get the desired output - except with empty lines, which you can remove by removing the , "\n" from the print statement.
The filenames don't match.
open(my $inhandle, '<', $infile) or die "Cant open $infile: $!";
open(my $outhandle, '>', $outfile) or die "Cant open $outfile: $!";
while(my $line = <$inhandle>) {
# Assumes that ex, Ex, eX, EX all are valid first characters
if($line =~ m{^ex}i) { # or if(lc(substr $line, 0 => 2) eq 'ex') {
print { $outhandle } $line;
print $line;
}
}
And yes, always always use strict;
You could also chomp $line and (if using perl 5.10) say $line instead of print "$line\n".