perl regex to read contents between double quotes

perl regex to read contents between double quotes - regex

I have a file which contains information something like this:
TAG1 "file1.txt"
some additional lines
TAG2 "file2.txt"
some more lines
TAG3 "file3.txt".
Now, I want to read what is inside the double quotes and assign it to variable ( something like $var1 = file1.txt $var2 = file2.txt $var3 = fil3.txt). Can anyone guild me how to do this.?

You could achieve your goal by
using regular expression
my #files;
while (my $line = <>) {
if (m/"([^"]+)"/) {
push #files, $1;
}
}
using split()
my #files;
while (my $line = <>) {
my (undef, $file, undef) = split /"/, $line, 3;
push #files, $file;
}

Except for the period after "file3.txt". (which I suspect is a artifact from posting the question), your data appears to be a CSV file with tabs.
If that's the case, I advise you to parse the file with Text::CSV
use strict;
use warnings;
use autodie;
use Text::CSV;
my $csv = Text::CSV->new ( { sep_char => "\t" } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
my #files;
open my $fh, '<', 'file.csv';
while ( my $row = $csv->getline( $fh ) ) {
push #files; $row->[1];
}
$csv->eof or $csv->error_diag();
close $fh;
print "#files";

Related

How to match 2 array?

I have 2 files I need to match.
File1.txt contains:
-----------------------------------------------
Words | Keyword | Sentence
-----------------------------------------------
Lunch >WORDS> when do you want to have lunch?.
Hate >WORDS> I hate you.
Other >WORDS> Other than that?
File2.txt contains:
I love you.
Other than that?.
I like you.
when do you want to have lunch?.
File1 will do the word matching with File2, after this keyword >WORDS>. Meaning File1 and File2 just compare the word "Other than that?" and "when do you want to have lunch?". So the result will take the same word after the keywords >WORDS>. I use array to do.
The expected output will print:
Other >WORDS> Other than that?.
Lunch >WORDS> when do you want to have lunch?.
CODE:
use strict;
use warnings;
use diagnostics;
use Data::Dumper;
use 5.010;
my $new= File1.txt; #read File1
my $old= File2.txt; #read File2
my $string1;
my $string2;
my #new_array;
my #old_array;
my $string11;
my #array1;
#---------------------------------------------------------------
# Main
#---------------------------------------------------------------
open(NEW_FILE,"<", $new) || die "Cannot open file $new to Read! - $!";
open(OLD_FILE,"<", $old) || die "Cannot open file $old to Read! - $!";
while (<NEW_FILE>) {
my $string1= $_;
my $string11= $_;
if ($string1=~ m/WORDS/){ #matching the Keyword >WORDS>
$string1 = $'; #string1 will take after >WORDS>
$string11 = $_; #string11 will take the full.
push (#new_array, ($string1)); #string1 = #new_array
push (#array1, ($string11)); }} #string11 = #array1
while (<OLD_FILE>) {
my $string2= $_;
if ($string2 =~ m/WORDS/){ #matching the Keyword >WORDS>
$string2 = $'; #string2 will take after >WORDS>
push (#old_array, ($string2)); #string2 = #old_array
}}
#------Do comparison between new file and old file. (only after WORDS)
my #intersection =();
my #unintersection = ();
my %hash1 = map{$_ => 1} #old_array;
foreach (#new_array){
if (defined $hash1{$_}){
push #intersection, $_; #this one will take the same array between new and old
}
else {
push #unintersection, $_; #this one will take the new array only. So, will read this one.
}}
Until this part, if I print the #unintersection, it will produce:
other than that?
when do you want to have lunch?.
Do comparison between#unintersection (result after WORDS) and (#array1).
my #same();
my #not_same= ();
my %hash2 = map{$_ => 1} #unintersection;
foreach (#array1) {
if (#array1 = m/WORDS/){
#array1 = $';
if (defined $hash2{$_}) {
#array1 = $_;
push #same, $_;
}
else {
push #not_same, $_;}}}
print #same;
print #not_same;
close(NEW_FILE);
close(OLD_FILE);
close(NEW_OUTPUT_FILE);
The result that I produce only 1. have lunch?"
Other >WORDS> Other than that?
Should be got 2 output. "Other >WORDS> Other than that?" and "Lunch >WORDS> when do you want to have lunch?"

The problem can be solved with a lookup table (implemented as hashref) build on information provided in File1.txt (words_lookup.dat).
Once we have lookup table at our disposal read File2.txt (words_data.dat) and compare with lookup table. If the input line matches lookup table then output stored value ($lookup->{$1}{line}) to the console.
use strict;
use warnings;
use feature 'say';
my($fh, $lookup);
my $fname_lookup = 'words_lookup.dat'; # File1.txt
my $fname_data = 'words_data.dat'; # File2.txt
my $re_lookup = qr/(\S+)\s+>WORDS>\s+(.*)/;
open $fh, '<', $fname_lookup
or die "Couldn't open $fname_lookup";
while( <$fh> ) {
chomp;
next unless /$re_lookup/;
$lookup->{$1}{sentence} = $2;
$lookup->{$1}{line} = $_;
}
close $fh;
open $fh, '<', $fname_data
or die "Couldn't open $fname_data";
while( my $line = <$fh> ) {
$line =~ /$lookup->{$_}{sentence}/ && say $lookup->{$_}{line} for keys $lookup->%*;
}
close $fh;
exit 0;
Output
Other >WORDS> Other than that?
Lunch >WORDS> when do you want to have lunch?.

How to read data in file which the file inside a file in perl?

I have list2.txt file and inside that file, there are several files like 02x5.txt or 0.3x5.txt etc. Then how to read data inside 02x5.txt file at one time? Inside 02x5.txt, there are height:20, length:5, colour:blue etc.
inside list2.txt:
02x5.txt
03x5.txt
inside 02x5.txt:
height:20,
length:5,
colour:blue
inside 03x5.txt:
height:25,
length:10,
colour:green
#!/usr/bin/perl
use strict;
use warnings;
# Reading a line from a file (or rather from a filehandle)
my $filename = "list2.txt";
if (open my $data, "<", $filename) {
while (my $row = <$data>) {
chomp $row;
if ($row =~ m/02x5.txt$/ ){
my $m = $row;
print "$m\n";
}
}
}
How can I read data for height and length from certain txt file?
Thank you

Please see following piece of code which performs tasks you've described, read data stored in hash %data which you can use anyway you desire.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $debug = 1;
my $filename = 'list2.txt';
my %data;
open my $fh, '<', $filename
or die "Couldn't open $filename";
my #filenames = <$fh>;
close $fh;
chomp #filenames;
foreach $filename(#filenames) {
open $fh, '<', $filename
or die "Couldn't open $filename";
while( <$fh> ) {
chomp;
my($k,$v) = split ':';
$data{$filename}{$k} = $v;
}
close $fh;
}
say Dumper(\%data);
Output
$VAR1 = {
'02x5.txt' => {
'colour' => 'blue',
'height' => '20, ',
'length' => '5, '
},
'03x5.txt' => {
'length' => '10, ',
'colour' => 'green',
'height' => '25, '
}
};

The answer is always to break the problem down into smaller chunks.
#!/usr/bin/perl
use strict;
use warnings;
my #files = get_list_of_files();
my %data;
foreach my $file (#files) {
my $file_data = get_data_from_file($file);
$data{$file} = $file_data;
}
# You now have your data in a hash called %data.
# Do whatever you want with it.
sub get_list_of_files {
open my $list_fh, '<', 'list2.txt'
or die $!;
my #files = <$list_fh>;
chomp(#files);
return #files;
}
sub get_data_from_file {
my ($filename) = #_;
my $record;
open my $fh, '<', $filename or die $!;
while (<$fh>) {
chomp;
# Remove trailing comma
s/,$//;
my ($key, $value) = split /:/;
$record->{$key} = $value;
}
return $record;
}

Multiple pattern match and replace

How to extract patterns from a file and replace the multiple patterns with a new pattern from a file?
For example:
Lets say the pattern file is pattern.txt, as follows with 2,000 lines.
a
b
d
e
f
....
...
...
File to replace pattens is replace.txt containing:
a,1
b,3
c,5
d,10
e,14
....
...
...
The intended final file content for file patterns.txt is:
a,1
b,3
d,10
e,14
....
...
...

Perl from command line,
perl -i -pe'
BEGIN{ local (#ARGV, $/, $^I) =pop; %h = split /[\s,]+/, <> }
s| (\S+)\K |,$h{$1}|x
' pattern.txt replace.txt
It slurps content of second file ($/ to undef), and temporarily disables in-place editing ($^I to undef), splits string on white-spaces/commas and populate %h hash in key/value manner. Then for every line of first file adds comma and value for current key.

With the possibility of arbitrary characters in your input, it might be safest to use Text::CSV. The benefit is that it will handle things like quoted delimiters, multiline strings, etc. The drawback is that it can break on non-csv content, so it sort of relies on your input being proper csv.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
eol => $/,
});
my %s;
my ($input, $replace) = #ARGV;
open my $fh, "<", $replace or die "Cannot open $replace: $!";
while (my $row = $csv->getline($fh)) {
my ($key, $line) = #$row;
$s{$key} = $line;
}
open $fh, "<", $input or die "Cannot open $input: $!";
while (<$fh>) {
chomp;
$csv->print(*STDOUT, [$_, $s{$_}]);
}

Not sure this really needs a regex as you're not really altering your source, as much as 'just' printing based on key fields.
So I would approach it something like this:
#!/usr/bin/env perl
use strict;
use warnings;
open( my $replace, "<", "replace.txt" ) or die $!;
my %replacements;
while (<$replace>) {
chomp;
my ( $key, $value ) = split(/,/);
$replacements{$key} = $value;
}
close($replace);
open( my $input, "<", "input.txt" ) or die $!;
open( my $output, ">", "patterns.txt" ) or die $!;
while ( my $line = <$input> ) {
chomp $line;
if ( $replacements{$line} ) {
print {$output} $replacements{$line}, "\n";
}
}
close($input);
close($output);
It's not as concise as some of the other examples, but hopefully clearer what it's actually doing. This I call a good thing. (I can make it much more compact, in the way that perl is (in)famous for).

Extract the matched pattern (Numbers +Name) in perl

I have used below mentioned pattern to Search and extract the string form big string.
example input string like
loadStringCombo('1',10,1,10,MaxCallApprComboBxId,quatstyle='width:50px;'quat)
Expected Output
(10,1,10,MaxCallApprComboBxId,)
But by this way i am getting only combobox1 as output.
while ( my $st = $str =~ /[0-9]+[\,][0-9]+[\,][0-9]+[\,][0-9a-zA-Z]+[\,]/g ) {
my $str3 = "combobox" . $st;
push #arry1, $str3 . "\n";
print #arry1, "\n";
open FILE, ">test.txt" or die $!;
print FILE #arry1, "\n";
}
Please guide me to extract the value 10,1,10,MaxCallApprComboBxId,.

Replace this line:
while ( my $st = $str =~ /[0-9]+[\,][0-9]+[\,][0-9]+[\,][0-9a-zA-Z]+[\,]/g ) {
by:
while ( my ($st) = $str =~ /(\d+,\d+,\d+,[0-9a-zA-Z]+,)/g ) {
whole loop:
while ($str =~ /(\d+,\d+,\d+,[0-9a-zA-Z]+,)/g ) {
push #arry1, "combobox$1";
}
use Data::Dumper;
print Dumper\#arry1;
open my $FILE, '>', 'test.txt' or die $!;
print $FILE "#arry1";

Perl: How to extract sequences based on gene number and nucleotide length?

I have 2 files, as follows:
file1.txt:
0 117nt, >gene_73|GeneMark.hm... *
0 237nt, >gene_3097|GeneMark.... *
0 237nt, >gene_579|GeneMark.h... *
0 237nt, >gene_988|GeneMark.h... *
0 189nt, >gene_97|GeneMark.hm... *
0 183nt, >gene_97|GeneMark.hm... *
file2.fasta:
>gene_735|GeneMark.hmm|237_nt|+|798985|799221
TTGTGGTTCGTGCCGCGCGACGCGTTGCGTCTGCAAACGCCCGACGAAGACATCGCGACCTATCTGTTCAACAAGCATGTGATTCGGCATCGGTTCTGTCCGACCTGCGGGATTCATCCGTTCGCGGAAGGCACGGACCCGAAGGGCAACGCGATGGCGGCCGTCAATCTTCGCTGCGTCGACGGCGTCGATCTCGACGCGTTGAGCGTCCGCCATTTCGACGGGCGCGCGCTCTGA
>gene_579|GeneMark.hmm|237_nt|+|667187|667423
ATGTACCACGGCGCCGAATTTGCCGCTGCCAAGGGCATGCGCTGGCTGCGAGATGCCGCCAACGGCTCTGCCTTCATCGCACCGGGCAGTCCGTGGCAAAACGGTTTCGTCGAGCGTTTCAACGGCAAGCTGCATGACGAATTGCTGAACCGGGAATGGTTCCGCGGCCGTGCCGAGACCAAGATGCTCATCGAACGCTCCGGCTACGGTCCGTCGAGTCTGACCGGATTCCGATGA
>gene_1876|GeneMark.hmm|234_nt|-|2168498|2168731
ATGCTGTTCTTTTCGCGCGCGGGCGTGTCGCGTGCGGCCGGCGGCCAATCATGCGGCGAGTCGTTTTGTCGCGGCTCGCGGCGCTTGCCGACGTTGGAATCGCGCGCGCCGATGCGCGGATCGGGGCGGCAACGTTTGCGTATGAGGAATGATGCGTTTGCGCATCGGGAATGGGCGCCTCGCCCCGGTTTCGCCGCGATTCCGCCCGACTCGAGGCAGTCGTTTTTCCGCTAA
>gene_3097|GeneMark.hmm|237_nt|-|3467022|3467258
GTGTCGAACGAACGTCGCGGCGAACGGCCGCTGCGGGCATCGCCGCAGGACGTCACACGGCGAACGTCGCGCGCGATCCTCGGCGGCCGCGAACGTGGGCCGTCCCGTGGCACGTTCGGCTCGCTCGGCATGGCGAACGACCGCCGCATCGCGCATCGCCGTCGCGCGGCCTCCAAAAAAACGGCGGTCAGCGACCGCCGGCTTTGGCCGAAACCGATGCGTCGTACGAATCAGTGA
>gene_988|GeneMark.hmm|237_nt|+|1121027|1121263
ATGACCTTGTCAGGCAACATCAAGGACGGCGACTGGACGGTCGAGGTGACGACATCGCCGGTGCAGGGCGGTTACGTGTGCGACATCGAGGTGATGCACGGCGCGCCGGGCGGCGCGTTCCGGCACGCGTTCCGGCACGGCGGCACTTATCCGGCCGAGCGCGACGCGATGATCGAGGGGCTGCGCGCGGGCATGACCTGGATCGAGCTGAAGATGTCGAAAGCATTCAATCTGTAA
>gene_97|GeneMark.hmm|105_nt|+|90122|90226
GTGACGCGTTTCGCGACGCGCGTCGATGGGGCGGGCGCGAAACCCGTTCGCCGCGATGCGGCGGACGGGGTATGGCCGAGCGCCGTCCGTCGCGGCGAGAGTTGA
>gene_97|GeneMark.hmm|183_nt|-|107002|107184
ATGGAGGCAATCGTGATCGAGCAAGTGATACTGGGCGTCTTTCTCGTACTGCCGCTTCTCATCGTCGCGGTGCTGTACTCCGACGAACTCTGGCAAGAACACCGCCTGCAGCATCCGCGCGACGAGCACACGCCACATATCGACTGGCGTCATCCGTGGCGGATCCTGCGGCGAGGGCACTAA
>gene_97|GeneMark.hmm|189_nt|-|98624|98812
GTGAAATACACGAGCGACCATTACGCGGGCGTCAAATTTGGCGCGCTGTACGGGTTCTCGAACGCGGCGAACTTCGCCGACAACCGCGCTCGCCGGCGCATGCGCGGCGTTCGCATACGCGATCGGCAAAAGCGGCGTGATGTGCGGTTGCCTGCCGCGCTCGCGCTATGCGCGGCACGCCATCGATGA
>gene_97|GeneMark.hmm|234_nt|+|105494|105727
ATGAAGATTCAAATCGCCATTGTTTATTTTGTCGCCCGTCACGCAAACGAGCAGGCGCGAAGCGGATCGGCGCGCATTGGCGAAGAGCCGGCGCGCATCGGCATCGCGCTCGCGCGACACATGCGCGCCGCGCGCGGCCGGTCGACGCCGGATTCGCCTGTCGATCGATCCGGTGCGCCCCGAGCCGATGAGCGGTACGCTTCGGCGCGCGCGCGACACGCGCGACACGCGTGA
>gene_979|GeneMark.hmm|225_nt|-|1115442|1115666
TTGATCGACGCGCGGGGCCGGCCGGGCCGCGGGGTATCGAAGGCGATCGACGCGCAACACGAATCGCCGCCGCGCGCCGAAACCTCGCTATGCGCGTCGCGCGCACGCGCGGCCGGCGGCGCACGCGCGGGTGTGCGCGGGCCGGCGGCGCGGCCGCTCGCACTGCGCGACCGCTCGCGCGCACGCCTTCCTCGGCACGCGCCGGGAATCCCGGCCCTTCAATGA
The output that I expect is:
>gene_579|GeneMark.hmm|237_nt|+|667187|667423
ATGTACCACGGCGCCGAATTTGCCGCTGCCAAGGGCATGCGCTGGCTGCGAGATGCCGCCAACGGCTCTGCCTTCATCGCACCGGGCAGTCCGTGGCAAAACGGTTTCGTCGAGCGTTTCAACGGCAAGCTGCATGACGAATTGCTGAACCGGGAATGGTTCCGCGGCCGTGCCGAGACCAAGATGCTCATCGAACGCTCCGGCTACGGTCCGTCGAGTCTGACCGGATTCCGATGA
>gene_3097|GeneMark.hmm|237_nt|-|3467022|3467258
GTGTCGAACGAACGTCGCGGCGAACGGCCGCTGCGGGCATCGCCGCAGGACGTCACACGGCGAACGTCGCGCGCGATCCTCGGCGGCCGCGAACGTGGGCCGTCCCGTGGCACGTTCGGCTCGCTCGGCATGGCGAACGACCGCCGCATCGCGCATCGCCGTCGCGCGGCCTCCAAAAAAACGGCGGTCAGCGACCGCCGGCTTTGGCCGAAACCGATGCGTCGTACGAATCAGTGA
>gene_988|GeneMark.hmm|237_nt|+|1121027|1121263
ATGACCTTGTCAGGCAACATCAAGGACGGCGACTGGACGGTCGAGGTGACGACATCGCCGGTGCAGGGCGGTTACGTGTGCGACATCGAGGTGATGCACGGCGCGCCGGGCGGCGCGTTCCGGCACGCGTTCCGGCACGGCGGCACTTATCCGGCCGAGCGCGACGCGATGATCGAGGGGCTGCGCGCGGGCATGACCTGGATCGAGCTGAAGATGTCGAAAGCATTCAATCTGTAA
>gene_97|GeneMark.hmm|183_nt|-|107002|107184
ATGGAGGCAATCGTGATCGAGCAAGTGATACTGGGCGTCTTTCTCGTACTGCCGCTTCTCATCGTCGCGGTGCTGTACTCCGACGAACTCTGGCAAGAACACCGCCTGCAGCATCCGCGCGACGAGCACACGCCACATATCGACTGGCGTCATCCGTGGCGGATCCTGCGGCGAGGGCACTAA
>gene_97|GeneMark.hmm|189_nt|-|98624|98812
GTGAAATACACGAGCGACCATTACGCGGGCGTCAAATTTGGCGCGCTGTACGGGTTCTCGAACGCGGCGAACTTCGCCGACAACCGCGCTCGCCGGCGCATGCGCGGCGTTCGCATACGCGATCGGCAAAAGCGGCGTGATGTGCGGTTGCCTGCCGCGCTCGCGCTATGCGCGGCACGCCATCGATGA
There are 4 sequences with gene number 97, but all in different length. I want the sequence with the correct gene length only which listed in file1.txt to output in the output.fasta file. What I've done so far is as follows (but failed and have some errors):
#!/usr/bin/perl
use strict;
use warnings;
my #genes;
open my $list, '<file1.txt';
while (my $line = <$list>) {
push (#genes, $1) if $line =~/\>(.*?)\|/gs;
}
my $tag1 = "0\t";
my $tag2 = "nt";
while (my $line = <$list>) {
if ($line =~ /$tag1(.*?)$tag2/) {
my $match1 = $1;
}
}
my $input;
{
local $/ = undef;
open my $fasta, '<file2.fasta';
my $tag3 = "GeneMark.hmm";
my $tag4 = "_nt";
while (my $input = <$fasta>) {
if ($input =~ /$tag3(.*?)$tag4/) {
my $match2 = $1; }}
close $fasta;
}
my #lines = split(/>/,$input);
foreach my $l (#lines) {
if ($l =~ /(.+?)\|/) {
my $real_name = $1;
if ($real_name ~~ #genes) {
if ($match2 = $match1) {
open (OUTFILE, '>>output.fasta');
print OUTFILE ">$l"; }
}
}
}
Can anyone give me some guide to correct the code? Or is there any better way to do this? Any help will be very much appreciated! Thanks! :)

Here's an option that uses Bio::SeqIO:
use strict;
use warnings;
use Bio::SeqIO;
my %hash;
open my $fh, '<', $ARGV[0] or die $!;
while (<$fh>) {
push #{ $hash{$2} }, $1 if /\s+(\d+)nt,.+?>(gene_\d+)\|/;
}
close $fh;
my $in = Bio::SeqIO->new( -file => $ARGV[1], -format => 'Fasta' );
my $out = Bio::SeqIO->new( -fh => \*STDOUT, -format => 'Fasta' );
while ( my $seq = $in->next_seq() ) {
$out->write_seq($seq)
if $seq->id =~ /(gene_\d+)\|.+?\|(\d+)_nt\|/ and grep /$2/, #{ $hash{$1} };
}
Usage: perl script.pl file1.txt file2.fasta [>outFile.fasta]
The second, optional parameter directs output to a file.
Output from your data:
>gene_579|GeneMark.hmm|237_nt|+|667187|667423
ATGTACCACGGCGCCGAATTTGCCGCTGCCAAGGGCATGCGCTGGCTGCGAGATGCCGCC
AACGGCTCTGCCTTCATCGCACCGGGCAGTCCGTGGCAAAACGGTTTCGTCGAGCGTTTC
AACGGCAAGCTGCATGACGAATTGCTGAACCGGGAATGGTTCCGCGGCCGTGCCGAGACC
AAGATGCTCATCGAACGCTCCGGCTACGGTCCGTCGAGTCTGACCGGATTCCGATGA
>gene_3097|GeneMark.hmm|237_nt|-|3467022|3467258
GTGTCGAACGAACGTCGCGGCGAACGGCCGCTGCGGGCATCGCCGCAGGACGTCACACGG
CGAACGTCGCGCGCGATCCTCGGCGGCCGCGAACGTGGGCCGTCCCGTGGCACGTTCGGC
TCGCTCGGCATGGCGAACGACCGCCGCATCGCGCATCGCCGTCGCGCGGCCTCCAAAAAA
ACGGCGGTCAGCGACCGCCGGCTTTGGCCGAAACCGATGCGTCGTACGAATCAGTGA
>gene_988|GeneMark.hmm|237_nt|+|1121027|1121263
ATGACCTTGTCAGGCAACATCAAGGACGGCGACTGGACGGTCGAGGTGACGACATCGCCG
GTGCAGGGCGGTTACGTGTGCGACATCGAGGTGATGCACGGCGCGCCGGGCGGCGCGTTC
CGGCACGCGTTCCGGCACGGCGGCACTTATCCGGCCGAGCGCGACGCGATGATCGAGGGG
CTGCGCGCGGGCATGACCTGGATCGAGCTGAAGATGTCGAAAGCATTCAATCTGTAA
>gene_97|GeneMark.hmm|183_nt|-|107002|107184
ATGGAGGCAATCGTGATCGAGCAAGTGATACTGGGCGTCTTTCTCGTACTGCCGCTTCTC
ATCGTCGCGGTGCTGTACTCCGACGAACTCTGGCAAGAACACCGCCTGCAGCATCCGCGC
GACGAGCACACGCCACATATCGACTGGCGTCATCCGTGGCGGATCCTGCGGCGAGGGCAC
TAA
>gene_97|GeneMark.hmm|189_nt|-|98624|98812
GTGAAATACACGAGCGACCATTACGCGGGCGTCAAATTTGGCGCGCTGTACGGGTTCTCG
AACGCGGCGAACTTCGCCGACAACCGCGCTCGCCGGCGCATGCGCGGCGTTCGCATACGC
GATCGGCAAAAGCGGCGTGATGTGCGGTTGCCTGCCGCGCTCGCGCTATGCGCGGCACGC
CATCGATGA
Bio::SeqIO lives to parse fasta (and other such) files, so the above leverages this capability. After creating a hash of arrays (HoA) from file1.txt, the fasta file is processed, and only matching fasta records are printed.
Hope this helps!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

perl regex to read contents between double quotes - regex

You could achieve your goal by using regular expression my #files; while (my $line = <>) { if (m/"([^"]+)"/) { push #files, $1; } } using split() my #files; while (my $line = <>) { my (undef, $file, undef) = split /"/, $line, 3; push #files, $file; }

Related

How to match 2 array?

How to read data in file which the file inside a file in perl?

Multiple pattern match and replace

Extract the matched pattern (Numbers +Name) in perl

Perl: How to extract sequences based on gene number and nucleotide length?

Categories

Resources