I am making a hash from regex expression. I run my program below and I have a check at the end to see if my hash made ok. But I keep getting an error for the value., I get this ARRAY(0x1a1c740), when it should be 437768. Keys can display ok. I didnt do split because i need the key to be the first part of a species name. This is what i am matching.
# "aaaaaaaaaa","aaaaaaaaaa","437768","Cryptophyta sp. CR-MAL06",0
Thanks very much for your help that you may give.
use strict;
use warnings;
open (my $in_fh,"$ARGV[0]") or die "Failed to open file: $!\n";
open (my $out_fh, ">genus.txt");
my %hash;
while ( my $line = <$in_fh> ) {
#
# "aaaaaaaaaa","aaaaaaaaaa","437768","Cryptophyta sp. CR-MAL06",0
#
if ($line =~ m/\"+\w+\"+\,+\"+\w+\"+\,+\"+(\d+)\"+\,+\"+(\w+)+.+/) {
my $v = $1;
my $k = $2;
$hash{$k} = [$v];
}
}
if (exists $hash{'Cryptophyta'}) {
print $out_fh $hash{'Cryptophyta'};
}
else {
print $out_fh "NO\n";
}
close $in_fh;
close $out_fh;
Change this line:
$hash{$k} = [$v];
to
$hash{$k} = $v;
[$v] is a reference to an array but you want to store a scalar.
[ ] creates an array, assigns the result of the enclosed expression to that array, and returns a reference to the array. It is that reference you are printing.
You were probably trying to support multiple matches. Two problems:
You continually create a new array with one element. Replace
$hash{$k} = [ $v ];
with
push #{ $hash{$k} }, $v;
You print the reference to the array rather than the contents of the array. Replace
print $out_fh $hash{'Cryptophyta'};
with
print $out_fh join(', ', #{ $hash{'Cryptophyta'} });
Related
I have a text file abc.txt that looks like this:
dQdC(sA1B2C3,sC5) = A lot of stuff
a = b = c
Baseball
dQdC(sC2V3X1,sD5) = A lot of stuff again
Now I want create two arrays in perl, one of which will contain A1B2C3 and C2V3X1, the other array will contain C5 and D5. I don't care about the other intermediate lines. To achieve this goal, I am trying this perl script:
for (my $in=0;$in<=$#lines;$in++){
if ($lines[$in]=~/dQdC\(s([A-Z0-9]+?),s([A-Z0-9]+?)\)/) {
print "1111"; #this line is just to check if it is at all going inside the loop
#A = $1;
#B = $2;
}
However, it is not even going inside the loop. So I guess I did something wrong with the regex. Will someone please tell me what I am doing wrong here?
my (#a, #b);
while ($file =~ /^dQdC\(s(\w+),s(\w+)\)/mg) {
push #a, $1;
push #b, $2;
}
or
my (#a, #b);
while (<$fh>) {
if (/^dQdC\(s(\w+),s(\w+)\)/) {
push #a, $1;
push #b, $2;
}
}
Working with parallel arrays isn't nice.
Alternative 1: Hash
my %hash = $file =~ /^dQdC\(s(\w+),s(\w+)\)/mg;
or
my %hash;
while (<$fh>) {
if (/^dQdC\(s(\w+),s(\w+)\)/) {
$hash{$1} = $2;
}
}
Alternative 2: AoA
use List::Util qw( pairs ); # 1.29+
my #pairs = pairs( $file =~ /^dQdC\(s(\w+),s(\w+)\)/mg );
or
my #pairs;
while (<$fh>) {
if (/^dQdC\(s(\w+),s(\w+)\)/) {
push #pairs, [ $1, $2 ];
}
}
If the format of your target lines is always as shown
use warnings;
use strict;
my $file = ...
my (#ary_1, #ary_2);
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
my ($v1, $v2) = /dQdC\(s([^,]+),s([^\)]+)/ or next;
push #ary_1, $v1;
push #ary_2, $v2;
}
which captures between ( and a , and then between a , and ). The first pattern might as well be s(.*?), as there is no benefit of the negated character class since the following , still need be matched (but I left it with [^...] for consistency with the other one).
Comments
In general better process a file line-by-line, unless there are specific reasons to read it first
C-style loop is rarely needed. To iterate over array index use for my $i (0..$#ary)
Please use warnings; and use strict; always
Try this:
(?<=\(s)([A-Z0-9]+)(?=,)
It matches substrings that come between (s and , using lookbehind and lookahead.
Similarily, use (?<=,s)([A-Z0-9]+)(?=\)) to capture the substrings between ,s and ).
Putting them together, you can create two capturing groups, each containing the different kind of substrings: (A1B2C3, C2V3X1), (C5, D5)
use strict;
use warnings;
my $tmp = join "\n", <DATA>;
my #biblables = ();
List items will be fetched and storing into #biblables in a while loop
while($tmp=~m/\\bibitem\[([^\[\]]*)\]{([^\{\}]*)}/g)
{
push(#biblables, "\\cite{$2}, ");
}
print #biblables;
While printing this we are getting the output like as:
\cite{BuI2001},\cite{BuI2002},\cite{BuI2003},\cite{BuI2004},\cite{BuI2005},\cite{BuI2006},
However we need the output like this
\cite{BuI2001},\cite{BuI2002},\cite{BuI2003},\cite{BuI2004},\cite{BuI2005},\cite{BuI2006}.
Hence we can use post regex to insert dot at the end of the listitem in array
while($tmp=~m/\\bibitem\[([^\[\]]*)\]{([^\{\}]*)}/g)
{
my $post = $';
if($post!~m/\\bibitem\[([^\[\]]*)\]{([^\{\}]*)}/)
{ push(#biblables, "\\cite{$2}."); }
else { push(#biblables, "\\cite{$2}, "); }
}
print #biblables;
Could you please advise me if there is short way to get this output
#
__DATA__
\bibitem[{BuI (2001)}]{BuI2001}
\bibitem[{BuII (2002)}]{BuI2002}
\bibitem[{BuIII (2003)}]{BuI2003}
\bibitem[{BuIV (2004)}]{BuI2004}
\bibitem[{BuV (2005)}]{BuI2005}
\bibitem[{BuVI (2006)}]{BuI2006}
You can add the comma and period after the fact:
while($tmp=~m/\\bibitem\[([^\[\]]*)\]{([^\{\}]*)}/g){
push(#biblables, "\\cite{$2}");
}
print join ', ', #biblables;
print ".\n";
If you read from a filehandle you can use eof to determine that you are on the last line, at which point you replace the comma by the dot in the last element. This allows you to build the array completely in the loop, as required.
use warnings;
use strict;
open my $fh, '<', 'bibitems.txt';
my #biblabels;
while (<$fh>) {
push #biblabels, "\\cite{$2}," if /\\bibitem\[([^\[\]]*)\]{([^\{\}]*)}/;
$biblabels[-1] =~ tr/,/./ if eof;
}
print "$_ " for #biblabels;
print "\n";
This prints your desired output.
The oef returns true if the next read will return end-of-file. This means that you've just read the last line, which got put on the array if it matched. This function is rarely needed but here it seems to find a fitting purpose. Note that eof and eof() behave a little differently. Please see the eof page.
If the other capture in the regex is meant to be used change the above to if (...) { ... }. Note that what is in {} is in Latex called citation keys, while the (optional) labels are things inside []. I'd go with the array name of #citkeys for clarity.
If you're determine to add the comma's and dots to the elements when
matching in the regex while loop, it can be done like this.
Since you don't know the total matches yet, just keep a reference to
the most recently pushed element.
Then append the , or . as needed.
Code
use strict;
use warnings;
$/ = undef;
my $tmp = <DATA>;
my #biblables = ();
my $ref = undef;
while( $tmp =~ /\\bibitem\[([^\[\]]*)\]{([^\{\}]*)}/g )
{
$$ref .= ", " if defined $ref;
$ref = \$biblables[ push(#biblables,"\\cite{$2}") ];
}
$$ref .= "." if defined $ref;
print #biblables;
__DATA__
\bibitem[{BuI (2001)}]{BuI2001}
\bibitem[{BuII (2002)}]{BuI2002}
\bibitem[{BuIII (2003)}]{BuI2003}
\bibitem[{BuIV (2004)}]{BuI2004}
\bibitem[{BuV (2005)}]{BuI2005}
\bibitem[{BuVI (2006)}]{BuI2006}
Output
\cite{BuI2001}, \cite{BuI2002}, \cite{BuI2003}, \cite{BuI2004}, \cite{BuI2005}, \cite{BuI2006}.
I need to sort lowercase strings based on the vowels appearing in the given strings that are inputted from a file. I need to print the sorted list of strings on the command prompt.
i.e. the vowel will be the substring of vowels appearing in stringA (vowelA), and vowelB the corresponding substring of stringB. StringA appears before stringB in the output if the substring vowelA appears before vowelB in the ascending ASCII order.
What I have currently:
#!/usr/bin/perl -w
use warnings;
use strict;
open my $INFILE, '<', $ARGV[0] or die $!;
while( my $line = <$INFILE> ) {
sub sort_vowels {
my $vowels_a = $a;
my $vowels_b = $b;
$vowels_a =~ s/[^aeiou]//g; # only vowels
$vowels_b =~ s/[^aeiou]//g;
return $vowels_a cmp $vowels_b; # compare the substrings
}
}
print sort { sort_vowels }; # print the sorted strings
close $INFILE;
Sample Input:
albacore
albatross
vermeil
panacea
apparate
parmesan
candelabra
fanfare
false
beans
This should output:
apparate
fanfare
panacea
albatross
albacore
false
parmesan
candelabra
beans
vermeil
The error I'm getting:
syntax error at sort_strings.pl line 22, near "};"
Execution of sort_strings.pl aborted due to compilation errors.
Not sure where I went wrong. Any help would be greatly appreciated!
Perhaps print sort { sort_vowels } <$INFILE>; is what you're looking for.
while and foreach loops allow you to work with a single element at a time, but sort requires an entire list as it's input.
Well, if you consider a vowels-only version of the string a key to the sort order of the words, then you can process each word like so:
push #{ $hash{ lc ( $word =~ s/[^aeiou]//igr ) } }, $word;
Starting with Perl 5.14 the /r flag returns the result. Same could be done this way, pre-5.14:
push #{ $hash{ lc( join( '', $word =~ m/([aeiou]+)/ig )) } }, $word;
Then outputting the order is only a matter of getting a sorted set of keys and dereffing the list of words stored within those keys:
say foreach map { #{ $hash{ $_ } } } sort keys %hash;
I am writing a simple program which capitalizes each word in a sentence. It gets a multi-line input. I then loop through the input lines, split each word in the line, capitalize it and then join the line again. This works fine if the input is one sentence, but as soon as I input two lines my program crashes (and if I wait too long my computer freezes.)
Here is my code
#input = <STDIN>;
foreach(#input)
{
#reset #words
#words= ();
#readability
$lines =$_;
#split sentence
#words = split( / /, $lines );
#capitalize each word
foreach(#words){
$words[$k] = ucfirst;
$k++;
}
#join sentences again
$lines = join(' ', #words);
#create output line
$output[$i]=$lines;
$i++;
}
#print the result
print "\nResult:\n";
foreach(#output){
print $output[$j],"\n";
$j++;
}
Could someone please tell me why it crashes?
use strict (and be told about not properly handled variables like your indices)
use for var (array) to get a usable item without an index (Perl isn't Javascript)
What isn't there can't be wrong (e.g. push instead of index)
In code:
use strict; # always!
my #input = <STDIN>; # the loop need in- and output
my #output = ();
for my $line (#input) # for makes readability *and* terseness easy
{
chomp $line; # get rid of eol
#split sentence
my #words = split( / /, $line );
#capitalize each word
for my $word (#words){ # no danger of mishandling indices
$word = ucfirst($word);
}
#join sentences again
$line = join(' ', #words);
#create output line
push #output, $line;
}
#print the result
print "\nResult:\n";
for my $line (#output){
print $line, "\n";
}
The problem is that you are using global variables throughout, so they are keeping their values across iterations of the loop. You have reset #words to an empty list even though you didn't need to - it is overwritten when you assign the result of split to it - but $k is increasing endlessly.
$k is initially set to undef which evaluates as zero, so for the first sentence everything is fine. But you leave $k set to the number of elements in #words so it starts from there instead of from zero for the next sentence. Your loop over #words becomes endless because you are assigning to (and so creating) $words[$k] so the array is getting longer as fast as you are looping through it.
The same problem applies to $i and $j, but execution never gets as far as reusing those.
Alshtough this was the only way of working in Perl 4, over twenty years ago, Perl 5 has made programming very much nicer to write and debug. You can now declare variables with my, and you can use strict which (among other things) insists that every variable you use must be declared, otherwise your program won't compile. There is also use warnings which is just as invaluable. In this case it would have warned you that you were using an undefined variable $k etc. to index the arrays.
If I apply use strict and use warnings, declare all of your variables and initialise the counters to zero then I get a working program. It's still not very elegant, and there are much better ways of doing it, but the error has gone away.
use strict;
use warnings;
my #input = <STDIN>;
my #output;
my $i = 0;
foreach (#input) {
# readability
my $lines = $_;
# split sentence
my #words = split ' ', $lines;
# capitalize each word
my $k = 0;
foreach (#words) {
$words[$k] = ucfirst;
$k++;
}
# join sentences again
$lines = join ' ', #words;
#create output line
$output[$i] = $lines;
$i++;
}
print "\nResult:\n";
my $j = 0;
foreach (#output) {
print $output[$j], "\n";
$j++;
}
I read all the questions that looked similar and am not gleaning an answer.
I saw a lot of "remove this or add that" but not a "move to another array..."
This question is below all of you but I am a Perl Newblet and could really use an elegant solution help.
I have an array with an unknown # of elements, each element containing a string similar to {img_names_will_change.jpg}some unknown text.
I need a subroutine that will strip the {yadayada.jpg} from each element and add the yadayada.jpg portion to a second array.
However, I still need each element in the original array to survive but without the {....}.
I looked into using substr or regex but got lost in the syntax.
I'll be RTFM on regex as well.
If i get you right, this could be a solution:
my #names = (
'{img_names_will_change.jpg}some unknown text',
'{img_names_will_change.jpg}some unknown text',
'{img_names_will_change.jpg}some unknown text'
);
my #extract;
foreach my $name ( #names ) {
if ( $name =~ m/{(\w+\.\w+)}/ ) {
push #extract, $1;
}
}
use Data::Dumper;
print Dumper #extract;
Output
$VAR1 = 'img_names_will_change.jpg';
$VAR2 = 'img_names_will_change.jpg';
$VAR3 = 'img_names_will_change.jpg';
Extracting the Imagename with {(\w+\.\w+)} and push it into another array.
I got it. Just added the rest of the string into $2 and applied it to $original. Thanks Paulchenkiller!
foreach my $orignal ( #original ) {
#Extracts the text from within "{}" and pushes it into #images
if ( $original =~ m/{(\w+\.\w+)}(.*)/ ) {
push #images, $1;
#Strips "{..}" out of #original
$original = $2;
}
}