Perl read file and extract price - regex

A file.txt file contains the string "hello the apple cost 10.99 today"
How can I extract the 10.99 only?
This is what I have so far:
open READFILE, ("<file.txt");
while (<READFILE>)
{
if ($a = $_ =~ m/\d\.\d/)
{
print "$a\n";
}
}
However, my output shows 1 instead of 10.99. Can you please tell me what's wrong?

The 1 you were getting was indicating that there was a match. You need brackets to capture the match.
open READFILE, "<file.txt";
while (<READFILE>)
{
if (m/([\d\.]+)/)
{
my $price = $1;
print "price = $price\n";
}
}

open READFILE, "file.txt";
while (<READFILE>)
{
if ($_ =~ /(\d+\.\d+)/)
{
print "$1\n";
}
}
You are good to go

Change your code to this,
while (my $a = <DATA>) {
if ($a =~ s/.*?(\d+\.\d+).*/$1/)
{
print "$a\n";
}
}
s/regex/replacement/modifiers
s/.*?(\d+\.\d+).*/$1/ All characters are matched except the decimal point number. This number was captured by a capturing group. Replacing all the chars with the chars inside group index 1 will give only the decimal point number. That particular number was assigned to the variable a.
OR
while (my $a = <DATA>) {
if ($a =~ m/.*?(\d+\.\d+).*/)
{
print "$1\n";
}
}
The regex .*?(\d+\.\d+).* was matched against the input string and on successful match, decimal point number from the input string was captured. By printing the group index 1 will give you the stored decimal point number.

Related

perl regular expression digit on string

I was trying to get the number from a string. The number can be pure digits e.g. 12334 or can be separated with underscore 12_345
I was trying with the below code but was unable to get anything from it.
my $string = "this is a 141_153_923 number : $_123_456";
if ($string =~ /\b\d*(?:\d+\_?\d+)*\d*\b/) {
print "$&\n";
}
expected output is 141_153_923
I have also tried with string 141_153_923 and it it still not returning anything even with
$string =~ /\b\d\b/
on the string 141_153_923
I hope you have the variable $_123_456 is declared in your Perl code. Otherwise you'll get an warning.
Now the regex. Try with this one:
if ($string =~ /\b(\d+(?:_\d+)*)\b/) {
Try this regex: /((?:\d+\_?)+)/.
...
my $string = "this is a 141_153_923 number : \$_123_456";
my $num;
if (($num) = $string =~ /((?:\d+\_?)+)/) {
print "first: $num\n";
}
$string = "this is a 141153923 number : \$_123_456";
if (($num) = $string =~ /((?:\d+\_?)+)/) {
print "second: $num\n";
}
...
output:
first: 141_153_923
second: 141153923

Perl conditional regex extraction

This conditional must match either telco_imac_city or telco_hier_city. When it succeeds I need to extract up to the second underscore of the value that was matched.
I can make it work with this code
if ( ($value =~ /(telco_imac_)city/) || ($value =~ /(telco_hier_)city/) ) {
print "value is: \"$1\"\n";
}
But if possible I would rather use a single regex like this
$value = $ARGV[0];
if ( $value =~ /(telco_imac_)city|(telco_hier_)city/ ) {
print "value is: \"$1\"\n";
}
But if I pass the value telco_hier_city I get this output on testing the second value
Use of uninitialized value $1 in concatenation (.) or string at ./test.pl line 19.
value is: ""
What am I doing wrong?
while (<$input>){
chomp;
print "$1\n" if /(telco_hier|telco_imac)_city/;
}
Perl capture groups are numbered based on the matches in a single statement. Your input, telco_hier_city, matches the second capture of that single regex (/(telco_imac_)city|(telco_hier_)city/), meaning you'd need to use $2:
my $value = $ARGV[0];
if ( $value =~ /(telco_imac_)city|(telco_hier_)city/ ) {
print "value is: \"$2\"\n";
}
Output:
$> ./conditionalIfRegex.pl telco_hier_city
value is: "telco_hier_"
Because there was no match in your first capture group ((telco_imac_)), $1 is uninitialized, as expected.
To fix your original code, use FlyingFrog's regex:
my $value = $ARGV[0];
if ( $value =~ /(telco_hier_|telco_imac_)city/ ) {
print "value is: \"$1\"\n";
}
Output:
$> ./conditionalIfRegex.pl telco_hier_city
value is: "telco_hier_"
$> ./conditionalIfRegex.pl telco_imac_city
value is: "telco_imac_"

Using iterated variables with regex

The point of the overall script is to:
step 1) open a single column file and read off first entry.
step 2) open a second file containing lots of rows and columns, read off EACH line one at a time, and find anything in that line that matches the first entry from the first file.
step3) if a match is found, then "do something constructive", and if not, go to the first file and take the second entry and repeat step 2 and step 3, and so on...
here is the script:
#!/usr/bin/perl
use strict; #use warnings;
unless(#ARGV) {
print "\usage: $0 filename\n\n"; # $0 name of the program being executed
exit;
}
my $list = $ARGV[0];
chomp( $list );
unless (open(LIST, "<$list")) {
print "\n I can't open your list of genes!!! \n";
exit;
}
my( #list ) = (<LIST>);
close LIST;
open (CHR1, "<acembly_chr_sorted_by_exon_count.txt") or die;
my(#spreadsheet) = (<CHR1>);
close CHR1;
for (my $i = 0; $i < scalar #list; $i++ ) {
print "$i in list is $list[$i]\n";
for (my $j = 1; $j < scalar #spreadsheet; $j++ ) {
#print "$spreadsheet[$j]\n";
if ( $spreadsheet[$j] ) {
print "will $list[$i] match with $spreadsheet[$j]?\n";
}
else { print "no match\n" };
} #for
} #for
I plan to use a regex in the line if ( $spreadsheet[$j] ) { but am having a problem at this step as it is now. On the first interation, the line print "will $list[$i] match with $spreadsheet[$j]?\n"; prints $list[$i] OK but does not print $spreadsheet[$j]. This line will print both variables correctly on the second and following iterations. I do not see why?
At first glance nothing looks overtly incorrect. As mentioned in the comments the $j = 1 looks questionable but perhaps you are skipping the first row on purpose.
Here is a more perlish starting point that is tested. If it does not work then you have something going on with your input files.
Note the extended trailing whitespace removal. Sometimes if you open a WINDOWS file on a UNIX machine and use chomp, you can have embedded \r in your text that causes weird things to happen to printed output.
#!/usr/bin/perl
use strict; #use warnings;
unless(#ARGV) {
print "\usage: $0 filename\n\n"; # $0 name of the program being executed
exit;
}
my $list = shift;
unless (open(LIST, "<$list")) {
print "\n I can't open your list of genes!!! \n";
exit;
}
open(CHR1, "<acembly_chr_sorted_by_exon_count.txt") or die;
my #spreadsheet = map { s/\s+$//; $_ } <CHR1>;
close CHR1;
# s/\s+$//; is like chomp but trims all trailing whitespace even
# WINDOWS files opened on a UNIX system.
for my $item (<LIST>) {
$item =~ s/\s+$//; # trim all trailing whitespace
print "==> processing '$item'\n";
for my $row (#spreadsheet) {
if ($row =~ /\Q$item\E/) { # see perlre for \Q \E
print "match '$row'\n";
}
else {
print "no match '$row'\n";
}
}
}
close LIST;

How can I know which portion of a Perl regex is matched by a string?

I want to search the lines of a file to see if any of them match one of a set of regexs.
something like this:
my #regs = (qr/a/, qr/b/, qr/c/);
foreach my $line (<ARGV>) {
foreach my $reg (#regs) {
if ($line =~ /$reg/) {
printf("matched %s\n", $reg);
}
}
}
but this can be slow.
it seems like the regex compiler could help. Is there an optimization like this:
my $master_reg = join("|", #regs); # this is wrong syntax. what's the right way?
foreach my $line (<ARGV>) {
$line =~ /$master_reg/;
my $matched = special_function();
printf("matched the %sth reg: %s\n", $matched, $regs[$matched]
}
}
where 'special_function' is the special sauce telling me which portion of the regex was matched.
Use capturing parentheses. Basic idea looks like this:
my #matches = $foo =~ /(one)|(two)|(three)/;
defined $matches[0]
and print "Matched 'one'\n";
defined $matches[1]
and print "Matched 'two'\n";
defined $matches[2]
and print "Matched 'three'\n";
Add capturing groups:
"pear" =~ /(a)|(b)|(c)/;
if (defined $1) {
print "Matched a\n";
} elsif (defined $2) {
print "Matched b\n";
} elsif (defined $3) {
print "Matched c\n";
} else {
print "No match\n";
}
Obviously in this simple example you could have used /(a|b|c)/ just as well and just printed $1, but when 'a', 'b', and 'c' can be arbitrarily complex expressions this is a win.
If you're building up the regex programmatically you might find it painful to have to use the numbered variables, so instead of breaking strictness, look in the #- or #+ arrays instead, which contain offsets for each match position. $-[0] is always set as long as the pattern matched at all, but higher $-[$n] will only contain defined values if the nth capturing group matched.

How to determine number of times a word appears in text?

How can I find the number of times a word is in a block of text in Perl?
For example my text file is this:
#! /usr/bin/perl -w
# The 'terrible' program - a poorly formatted 'oddeven'.
use constant HOWMANY => 4; $count = 0;
while ( $count < HOWMANY ) {
$count++;
if ( $count == 1 ) {
print "odd\n";
} elsif ( $count == 2 ) {
print "even\n";
} elsif ( $count == 3 ) {
print "odd\n";
} else { # at this point $count is four.
print "even\n";
}
}
I want to find the number of "count" word for that text file. File is named terrible.pl
Idealy it should use regex and with minimum number of line of code.
EDIT: This is what I have tried:
use IO::File;
my $fh = IO::File->new('terrible.pl', 'r') or die "$!\n";
my %words;
while (<$fh>) {
for my $word ($text =~ /count/g) {
print "x";
$words{$word}++;
}
}
print $words{$word};
Here's a complete solution. If this is homework, you learn more by explaining this to your teacher than by rolling your own:
perl -0777ne "print+(##=/count/g)+0" terrible.pl
If you are trying to count how many times appears the word "count", this will work:
my $count=0;
open(INPUT,"<terrible.pl");
while (<INPUT>) {
$count++ while ($_ =~ /count/g);
}
close(INPUT);
print "$count times\n";
I'm not actually sure what your example code is but you're almost there:
perl -e '$text = "lol wut foo wut bar wut"; $count = 0; $count++ while $text =~ /wut/g; print "$count\n";'
You can use the /g modifier to continue searching the string for matches. In the example above, it will return all instances of the word 'wut' in the $text var.
You can probably use something like so:
my $fh = IO::File->new('test.txt', 'r') or die "$!\n";
my %words;
while (<$fh>) {
for my $word (split / /) {
$words{$word}++;
}
}
That will give you an accurate count of every "word" (defined as a group of characters separated by a space), and store it in a hash which is keyed by the word with a value of the number of the word which was seen.
perdoc perlrequick has an answer. The term you want in that document is "scalar context".
Given that this appears to be a homework question, I'll point you at the documentation instead.
So, what are you trying to do? You want the number of times something appears in a block of text. You can use the Perl grep function. That will go through a block of text without needing to loop.
If you want an odd/even return value, you can use the modulo arithmetic function. You can do something like this:
if ($number % 2) {
print "$number is odd\n"; #Returns a "1" or true
}
else {
print "$number is even\n"; #Returns a "0" or false
}