change number to english Perl - regex

Hye, Can you check my script where is my problem..sorry I'm new in perl..I want to convert from number to english words for example 1400 -> one thousand four hundred...I already used
Lingua::EN::Numbers qw(num2en num2en_ordinal);
this is my input file.txt
I have us dollar 1200
and the output should be. "I have us dollar one thousand two hundred"
this is my script
#!/usr/bin/perl
use utf8;
use Lingua::EN::Numbers qw(num2en num2en_ordinal);
if(! open(INPUT, '< snuker.txt'))
{
die "cannot opent input file: $!";
}
select OUTPUT;
while($lines = <INPUT>){
$lines =~ s/usd|USD|Usd|uSd|UsD/us dollar/g;
$lines =~ s/\$/dollar /g;
$lines =~ s/rm|RM|Rm|rM/ringgit malaysia /g;
$lines =~ s/\n/ /g;
$lines =~ s/[[:punct:]]//g;
$lines =~ s/(\d+)/num2en($lines)/g; #this is where it should convert to english words
print lc($lines); #print lower case
}
close INPUT;
close OUTPUT;
close STDOUT;
the output i got is "i have us dollar num2en(i have us dollar 1200 )"
thank you

You need to refer to the capture using $1 instead of passing the $lines in your last regex where you also need an e flag at the end so that it is evaluated as an expression. You can use i flag to avoid writing all combinations of [Uu][Ss][Dd]...:
while($lines = <INPUT>){
$lines =~ s/usd/us dollar/ig;
$lines =~ s/\$/dollar /g;
$lines =~ s/rm/ringgit malaysia /ig;
$lines =~ s/\n/ /g;
$lines =~ s/[[:punct:]]//g;
$lines =~ s/(\d+)/num2en($1)/ge; #this is where it should convert to english words
print lc($lines), "\n"; #print lower case
}

You’re missing the e modifier on the regex substitution:
$ echo foo 42 | perl -pe "s/(\d+)/\$1+1/g"
foo 42+1
$ echo foo 42 | perl -pe "s/(\d+)/\$1+1/ge"
foo 43
See man perlop:
Options are as with m// with the addition of the following replacement
specific options:
        e    Evaluate the right side as an expression.
Plus you have to refer to the captured number ($1), not the whole string ($lines), but I guess you have already caught that.

The problem here is that you are confusing regexps with functions. In the line where you try to do the conversion, you're not calling the function num2en; instead, you're replacing the number with the text num2en($line). Here's a suggestion for you:
($text, $number) = $lines =~ s/(.*)+(\d+); # split the line into a text part and a number part
print lc($text . num2en($number)); # print first the text, then the converted number;

Related

Perl: Empty $1 regex value when matching?

Readers,
I have the following regex problem:
code
#!/usr/bin/perl -w
use 5.010;
use warnings;
my $filename = 'input.txt';
open my $FILE, "<", $filename or die $!;
while (my $row = <$FILE>)
{ # take one input line at a time
chomp $row;
if ($row =~ /\b\w*a\b/)
{
print "Matched: |$`<$&>$'|\n"; # the special match vars
print "\$1 contains '$1' \n";
}
else
{
#print "No match: |$row|\n";
}
}
input.txt
I like wilma.
this line does not match
output
Matched: |I like <wilma>|
Use of uninitialized value $1 in concatenation (.) or string at ./derp.pl line 14, <$FILE> line 22.
$1 contains ''
I am totally confused. If it is matching and I am checking things in a conditional. Why am I getting an empty result for $1? This isn't supposed to be happening. What am I doing wrong? How can I get 'wilma' to be in $1?
I looked here but this didn't help because I am getting a "match".
You don't have any parentheses in your regex. No parentheses, no $1.
I'm guessing you want the "word" value that ends in -a, so that would be /\b(\w*a)\b/.
Alternatively, since your whole regex only matches the bit you want, you can just use $& instead of $1, like you did in your debug output.
Another example:
my $row = 'I like wilma.';
$row =~ /\b(\w+)\b\s*\b(\w+)\b\s*(\w+)\b/;
print join "\n", "\$&='$&'", "\$1='$1'", "\$2='$2'", "\$3='$3'\n";
The above code produces this output:
$&='I like wilma'
$1='I'
$2='like'
$3='wilma'

Printing first instance of match in each line of file (Perl)

I have the following in an executable .pl file:
#!/usr/bin/env perl
$file = 'TfbG_peaks.txt';
open(INFO, $file) or die("Could not open file.");
foreach $line (<INFO>) {
if ($line =~ m/[^_]*(?=_)/){
#print $line; #this prints lines, which means there are matches
print $1; #but this prints nothing
}
}
Based on my reading at http://goo.gl/YlEN7 and http://goo.gl/VlwKe, print $1; should print the first match in each line, but it doesn't. Help!
No, $1 should print the string saved by so-called capture groups (created by the bracketing construct - ( ... )). For example:
if ($line =~ m/([^_]*)(?=_)/){
print $1;
# now this will print something,
# unless string begins from an underscore
# (which still matches the pattern, as * is read as 'zero or more instances')
# are you sure you don't need `+` here?
}
The pattern in your original code didn't have any capture groups, that's why $1 was empty (undef, to be precise) there. And (?=...) didn't count, as these were used to add a look-ahead subexpression.
$1 prints what the first capture ((...)) in the pattern captured.
Maybe you were thinking of
print $& if $line =~ /[^_]*(?=_)/; # BAD
or
print ${^MATCH} if $line =~ /[^_]*(?=_)/p; # 5.10+
But the following would be simpler (and work before 5.10):
print $1 if $line =~ /([^_]*)_/;
Note: You'll get a performance boost when the pattern doesn't match if you add a leading ^ or (?:^|_) (whichever is appropriate).
print $1 if $line =~ /^([^_]*)_/;

How to save a matching regex's value to a variable in one line of perl?

I'm sure there is a very simple way to do this, but whenever I search for examples, I get the two step method. Here is what I typically do:
$data =~ m/(my_query)/;
$result = $1;
I want to set $result in the same line as the regex and never use $1. Thanks!
my($result) = ($data =~ m/(my_query)/);
As noted in a comment, the my($result) needs the parentheses to provide an array context for the result of the match. In an array context, you get the $1 etc allocated to the array. You could use #result = ($data =~ m/(my_query)/);; you could omit the my but you would need to keep the parentheses; you could subscript the array using $result = ($data =~ m/(my_query)/)[0]; (thanks ysth). The key words here are 'array context'.
Examples:
$ perl -e '$data="abcdef";my($result)=($data =~ m/(cde)/); print "$result\n"'
cde
$ perl -e '$data="abcdef"; ($result)=($data =~ m/(cde)/); print "$result\n"'
cde
$ perl -e '$data="abcdef"; #result =($data =~ m/(cde)/); print "$result[0]\n"'
cde
$ perl -e '$data="abcdef"; $result =($data =~ m/(cde)/)[0]; print "$result\n"'
cde
$
You didn't specify what problem you want to avoid, but there is definitely one to avoid. The following code assigns something unknown to $result when the pattern doesn't match:
$data =~ /(my_query)/;
my $result = $1;
You could use a conditional to assign something useful to $result when the pattern doesn't match
my $result = $data =~ /(my_query)/ ? $1 : undef;
Or you could take advantage of the fact that m// in list context returns what it captured.
my ($result) = $data =~ /(my_query)/;
$data="abcde";
$data =~ s/(cde)/$result=$1/e;

Perl : How to replace a _[0-9] with a comma in perl or any language

I have a file with the following pattern
'21pro_ABCD_EDG_10800_48052_2 0.0'
How do i replace the _[0-9] with a ,(comma)
so that i can get the output as
21pro_ABCD_EDG,10800,48052,2, 0.0
To replace the _[0-9] with a , you can do this:
$s =~ s/_([0-9])/,$1/g
#the same without capturing groups
$s =~ s/_(?=[0-9])/,/g;
Edit:
To get the extra comma after the 2 you can do this:
#This puts a , before all whitespace.
$s =~ s/_(?=[0-9])|(?=\s)/,/g;
#This one puts a , between [0-9] and any whitespace
$s =~ s/_(?=[0-9])|(?<=[0-9])(?=\s)/,/g;
The sed approach would be something like the following:
rupert#hake:~ echo '21pro_ABCD_EDG_10800_48052_2 0.0' | sed 's/_\([0-9]\)/,\1/g'
21pro_ABCD_EDG,10800,48052,2 0.0
Using the expression mentioned by jacob, here is the code snippet to perform the substitution for a large file
#!/usr/local/bin/perl
open (MYFILE, 'test');
while (<MYFILE>) {
chomp;
$s=$_;
$s =~ s/_(?=[0-9])|(?<=[0-9])(?=\s)/,/g;
$s =~ s/\s//g;
print "$s\n";
}
close (MYFILE);

How do I substitute with an evaluated expression in Perl?

There's a file dummy.txt
The contents are:
9/0/2010
9/2/2010
10/11/2010
I have to change the month portion (0,2,11) to +1, ie, (1,3,12)
I wrote the substitution regex as follows
$line =~ s/\/(\d+)\//\/\1+1\//;
It's is printing
9/0+1/2010
9/2+1/2010
10/11+1/2010
How to make it add - 3 numerically than perform string concat? 2+1??
Three changes:
You'll have to use the e modifier
to allow an expression in the
replacement part.
To make the replacement globally
you should use the g modifier. This is not needed if you've one date per line.
You use $1 on the replacement side, not a backreference
This should work:
$line =~ s{/(\d+)/}{'/'.($1+1).'/'}eg;
Also if your regex contains the delimiter you're using(/ in your case), it's better to choose a different delimiter ({} above), this way you don't have to escape the delimiter in the regex making your regex clean.
this works: (e is to evaluate the replacement string: see the perlrequick documentation).
$line = '8/10/2010';
$line =~ s!/(\d+)/!('/'.($1+1).'/')!e;
print $line;
It helps to use ! or some other character as the delimiter if your regular expression has / itself.
You can also use, from this question in Can Perl string interpolation perform any expression evaluation?
$line = '8/10/2010';
$line =~ s!/(\d+)/!("/#{[$1+1]}/")!e;
print $line;
but if this is a homework question, be ready to explain when the teacher asks you how you reach this solution.
How about this?
$ cat date.txt
9/0/2010
9/2/2010
10/11/2010
$ perl chdate.pl
9/1/2010
9/3/2010
10/12/2010
$ cat chdate.pl
use strict;
use warnings;
open my $fp, '<', "date.txt" or die $!;
while (<$fp>) {
chomp;
my #arr = split (/\//, $_);
my $temp = $arr[1]+1;
print "$arr[0]/$temp/$arr[2]\n";
}
close $fp;
$