I am opening files in a directory that contain two lines of sequences in each file. The top sequence is longer than the bottom, but includes the bottom sequence. I would like to extend the bottom sequence by the two flanking letters in each direction once it is found in the top sequence. I am trying this by a doing a regex match, but am getting a uninitialized error for the $newsequence variable.
Here is what a typical file looks like:
>CCCCNNNNNCCCC
NNNNN
I would like to print to one file all the sequences in the following format:
>CCCCNNNNNCCCC
CCNNNNNCC
Here is my code so far:
use strict;
use warnings;
my ($directory) = #ARGV
my #array = glob "$directory/*";
my $header;
my $sequence;
my $newsequence;
open(OUT, ">", "/path/to/out.txt") or die $!;
foreach my $file (#array){
open (my $fh, $file) or die $!;
while (my $line = <$fh>){
chomp $line;
if ($line =~ /^>/) {
$header = $line;
} elsif ($line =~ /^[CN]/) {
$sequence = $line;
}
my ($newsequence) = $header =~ /(([CN]{2})($sequence)([CN]{2}))/;
}
print OUT $header, "\n", $newsequence, "\n";
}
How can I improve my regex assignment to $newsequence to get adequate output? Thanks.
This line is wrong:
my ($newsequence) = $header =~ /(([CN]{2})($sequence)([CN]{2}))/;
The my keyword is creating a new variable $newsequence local to the while loop, not assigning the variable in the main script. So when you try to write $newsequence after the loop is done, the variable is still uninitialized.
Either put the print statement inside the while loop, or remove the my keyword in this assignment.
Also, you should put that assignment statement inside the elseif block. Otherwise, you'll try to use $sequence before you've assigned it. So the whole thing should look like:
foreach my $file (#array){
open (my $fh, $file) or die $!;
while (my $line = <$fh>){
chomp $line;
if ($line =~ /^>/) {
$header = $line;
} elsif ($line =~ /^[CN]/) {
$sequence = $line;
($newsequence) = $header =~ /(([CN]{2})($sequence)([CN]{2}))/;
print OUT $header, "\n", $newsequence, "\n";
}
}
}
If your conditions are accurate (each file contains only 2 lines, and the sequence is always found in the header), then you can make your code a lot simpler, including the regex:
for my $file (#array) {
open (my $fh, $file) or die $!;
chomp ((my $header, my $sequence) = <$fh>);
$header =~ /(..)$sequence(..)/;
print OUT "$header\n$1$sequence$2";
}
Related
I have a file eg.txt with contents of this sort :
....text...
....text...
COMP1 = ../../path1/path2/path3
COMP2 = ../../path4/path5/path6
and so on, for a large number of application names (the "COMP"s). I need to get the path -- the stuff including and after the second slash -- for a user-specified application.
This is the code I've been trying :
use strict;
use warnings;
my $line = "";
my $app = "";
print "Enter the app";
$app = <STDIN>;
print $app;
open my $fh, '<', "eg.txt" or die "Cannot open $!";
while (<$fh>) {
$line = <$fh>;
if ( $line && $line =~ /($app)( = )(..\/)(..)(.*)/ ) {
print $5;
}
}
This prints the name of the user-input application, and does nothing else. Any help would be greatly appreciated!
There are two main problems with your program
The $app variable contains a newline at the end from the enter key you pressed when you typed it in. That will prevent the pattern from matching so you need to use chomp to remove it. The same applies to lines read from your file
The <$fh> in your while statement reads a line from your file into the default variable $_, and then $line = <$fh> reads another, so you are ignoring alternate lines from the file
Here is a version of your program that I think should work although I am unable to test it at present. I have dropped your $line variable altogether and hope that doesn't confuse you. $_ is the default variable for the pattern match so it isn't mentioned explicitly anywhere
use strict;
use warnings;
print "Enter the app: ";
my $app = <STDIN>;
chomp $app;
open my $fh, '<', 'eg.txt' or die "Cannot open: $!";
while ( <$fh> ) {
if ( /$app\s*=\s*(.+)/ ) {
my $path = $1;
$path =~ s/.*\.\.//;
print $path, "\n";
}
}
The input did not matched in regex because newlines were coming along with them, so better use chomp to trim them. In while loop you are displacing two times the file handle, I don't know why. So after corrections this should work:
use strict;
use warnings;
my $line = "";
my $app = "";
print "Enter the app";
chomp($app = <STDIN>);
print "$app: ";
open my $fh, '<', "eg.txt" or die "Cannot open $!";
while($line = <$fh>)
{
chomp $line;
if($line && $line =~ /($app)( = )(..\/)(..)(.*)/)
{
print "$5 \n";
}
}
close($fh);
Try this code:
use strict;
use warnings;
my $line = "";
my $app = "";
print "Enter the app";
$app = <STDIN>;
print $app;
open my $fh, '<', "eg.txt" or die "Cannot open $!";
my #line = <$fh>;
my #fetch = map { /COMP\d+\s\=\s(\..\/\..\/.*)/g } #line ;
$, = "\n";
print #fetch;
and then please send your response.
You are accessing <$fh> twice in your loop. This will have the effect of interpreting only every other line. You might want to change the top of the loop to something like this:
while (defined(my $line = <$fh>)) {
and remove the my $line ... at the top of the program.
Also, you might want to consider chomping your input line so that you don't have to think about the trailing newline character:
while (defined(my $line = <$fh>)) {
chomp $line;
Your regular expression is also a bit dicey. You probably want to bind it to the beginning and end of the search space and escape the literal dots. You may also want $app to be interpreted as a string rather than a regexp, which can be done by wrapping it with \Q...\E. Also unless your file format specifies single spaces around the equals, I'd be tempted to make those flexible to zero or more occurrences. Also, if you aren't going to use the earlier captures, I would say don't do them, so:
if ($line && $line =~ /^\Q$app\E *= *\.\.\/\.\.(.*)$/)
{
print $1;
(Some may say you should use \A and \z rather than ^ and $. That choice is left as an exercise to the reader.)
I have a word (MODEL 1) in my file 20 times interspersed by lines of text. I want to replace it with the frequency number of occurrence e.g. MODEL 1 and then when it occurs again then MODEL 2 and then MODEL 3 and so on so forth.
However my loop gets stuck at the first round and not looping till it has replaced all of words.
Can any one tell me what I have been missing out. Any help would be much appreciated.
The code is listed below:
#!/usr/bin/perl -w
my $file = 'test.text';
open (my $fh, $file);
while (my $row = <$fh>) {
chomp $row;
if (($row) =~ /^MODEL 1/){
$i = 1;
$row =~ s/^MODEL 1/MODEL $i/g;
$i++;
}
print "$row\n";
}
You need to move your counter variable outside of the loop.
As a simplification, use s///e to match and replace in a single step:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my $file = 'test.text';
open my $fh, '<', $file;
my $counter = 0;
while (<$fh>) {
chomp;
s/^MODEL \K1/++$counter/e;
print "$_\n";
}
Move $i = 1 initialization to above while loop
$i = 1;
while (my $row = <$fh>) { chomp $row;
...
}
You're resetting it back to for every line, so there won't be any change at all.
You can increment a counter in the replacement pattern itself with the e modifier:
my $i=1;
while (my $row = <$fh>) {
chomp $row;
$row =~ s/MODEL \K1/$i++/ge
print "$row\n";
}
I am struggling with this part for my college exercise...
I need to read string from a file and put them into different variable...
Team, kindly review and please reply in your free moment...
Input File: (test_ts.txt)
Test1--12:45
Test2--1:30
Script:
use strict;
use warnings;
my $filename = "test_ts.txt";
my #name = ();
my #hrs=();
my #mins=();
open(my $fh, $filename)
or die "Could not open file '$filename' $!";
while (my $row = <$fh>) {
chomp $row;
push(#name, $row);
print "$row\n";
}
Output:
Test1--12:45
Test2--1:30
Expected output:
Test1
Test2
*(Array should have the below values
name[0]=Test1
name[1]=Test2
hrs[0]=12
hrs[1]=1
mins[0]=45
mins[1]=30)*
Tried using Split:
while (my $row = <$fh>) {
chomp $row;
$row=split('--',$row);
print $row;
$row=split(':',$row);
print $row;
push(#name, $row);
print "$row\n";
}
Output which i got after trying split:
211
211
split returns a list; when you use it in a scalar context like $row = split(..., $row); then:
You only get the number of elements of the array assigned.
You destroy your $row in the input.
You need something more like:
while (my $row = <$fh>)
{
chomp $row;
my #bits = split /[-:]+/, $row;
print "#bits\n";
push(#name, $bits[0]);
…other pushes…
print "$row\n";
}
You will need to learn about scalar and array context sooner or later. In the mean time, assign the result of split to an array.
Here is the simple method split the row based on "--" and then time on basis of ":". Hope this help you.
use strict;
use warnings;
my $filename = "test_ts.pl";
my #name = ();
my #hrs=();
my #mins=();
open(my $fh, $filename)
or die "Could not open file '$filename' $!";
while (my $row = <$fh>) {
chomp $row;
my ($a,$b) = split("--", $row);
my ($c, $d) = split (":", $b);
push(#name, $a);
push(#hrs, $c);
push(#mins, $d);
}
print "$name[0]\n";
print "$name[1]\n";
print "$hrs[0]\n";
print "$hrs[1]\n";
print "$mins[0]\n";
print "$mins[1]\n";
It is sometimes simpler to use a global regular expression than split. This short program works by finding all alphanumeric fields in the target string.
use strict;
use warnings;
use autodie;
open my $fh, '<', 'test_ts.txt';
my (#name, #hrs, #mins);
while (<$fh>) {
my ($name, $hrs, $mins) = /\w+/g;
push #name, $name;
push #hrs, $hrs;
push #mins, $mins;
print "$name\n";
}
print "\n";
print "Names: #name\n";
print "Hours: #hrs\n";
print "Minutes: #mins\n";
output
Test1
Test2
Names: Test1 Test2
Hours: 12 1
Minutes: 45 30
I have an array containing words that I want to remove from each line of a file. The code I am using is as follows:
my $INFILE;
my $OUTFILE;
my $STOPLIST;
open($INFILE, '<', $ARGV[0]);
open($STOPLIST, '<', "stop.txt");
open($OUTFILE, '>', $ARGV[1]);
my #stoplist = <$STOPLIST>;
my $line;
my $stopword;
while (<$INFILE>) {
$line = $_;
$line =~ s/\[[0-9]*\] //g;
$line =~ s/i\/.*\/; //g;
foreach (#stoplist) {
$stopword = $_;
$line =~ s/${stopword}//g;
}
print $OUTFILE lc($line);
}
However, the words in the stoplist still appear in the text in the output file, which would indicate that the $line =~ s/${stopword}//g; line wasn't doing it's job as I expected.
How can I make sure that all words in the stop list that appear in the input text are replaced with 0 characters in the output?
You need to remove newlines from your stoplist using chomp:
my #stoplist = <$STOPLIST>;
chomp #stoplist;
I have a question about regular expressions. I have a file and I need to parse it in such a way that I could distinguish some specific blocks of text in it. These blocks of text are separated by two empty lines (there are blocks which are separated by 3 or 1 empty lines but I need exactly 2). So I have a piece of code and this is \s*$^\s*$/ regular expression I think should match, but it does not.
What is wrong?
$filename="yu";
open($in,$filename);
open(OUT,">>out.text");
while($str=<$in>)
{
unless($str = /^\s*$^\s*$/){
print "yes";
print OUT $str;
}
}
close($in);
close(OUT);
Cheers,
Yuliya
By default, Perl reads files a line at a time, so you won't see multiple new lines. The following code selects text terminated by a double new line.
local $/ = "\n\n" ;
while (<> ) {
print "-- found $_" ;
}
New Answer
After having problems excluding >2 empty lines, and a good nights sleep here is a better method that doesn't even need to slurp.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my #blocks; #each element will be an arrayref, one per block
#that referenced array will hold lines in that block
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 0;
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
$block_num++; # move on to next block
$empty = 0;
} else {
$empty = 0;
}
push #{ $blocks[$block_num] }, $line;
}
#write out each block to a new file
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out join("\n", #$block);
}
In fact rather than store and write later, you could simply write to one file per block as you go:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
close(OUT); #just learned this line isn't necessary, perldoc -f close
open(OUT, '>', ++$block_num . '.txt');
$empty = 0;
} else {
$empty = 0;
}
print OUT "$line\n";
}
close(OUT);
use 5.012;
open my $fh,'<','1.txt';
#slurping file
local $/;
my $content = <$fh>;
close $fh;
for my $block ( split /(?<!\n)\n\n\n(?!\n)/,$content ) {
say 'found:';
say $block;
}
Deprecated in favor of new answer
justintime's answer works by telling perl that you want to call the end of a line "\n\n", which is clever and will work well. One exception is that this must match exactly. By the regex you are using it makes it seem like there might be whitespace on the "empty" lines, in which case this will not work. Also his method will split even on more than 2 linebreaks, which was not allowed in the OP.
For completeness, to do it the way you were asking, you need to slurp the whole file into a variable (if the file is not so large as to use all your memory, probably fine in most cases).
I would then probably say to use the split function to split the block of text into an array of chunks. Your code would then look something like:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my $text;
open(my $fh, '<', $file);
{
local $/; enables slurp mode inside this block
$text = <$fh>;
}
close($fh);
my #blocks = split(
/
(?<!\n)\n #check to make sure there isn't another \n behind this one
\s*\n #first whitespace only line
\s*\n #second "
(?!\n) #check to make sure there isn't another \n after this one
/x, # x flag allows comments and whitespace in regex
$text
);
You can then do operations on the array. If I understand your comment to justintime's answer, you want to write each block out to a different file. That would look something like
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}
Notice that since you open $out lexically (with my) when it reaches the end of the foreach block, the $out variable dies (i.e. "goes out of scope"). When this happens to a lexical filehandle, the file is automatically closed. And you can do a similar thing to that with justintime's method as well:
local $/ = "\n\n" ;
my $file_num = 1;
while (<>) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}