Regex for Evalue substitution in Perl - regex

What I am trying to achieve is convert the Evalue 1e-2 to 0.01.
my $cutoff = "1e-12";
if ($cutoff =~ m/^\de-{1}\d+?$/){
$cutoff = s/e-/*10^(-/;
$cutoff .= ")";
}
print "$cutoff\n";
This is part of a bigger script and running it under use warnings; always gives me Use of uninitialized value $_ in substitution (s///) at test.pl line 4, <STDIN> line 1.
Does anyone spot the mistake here? I cannot seem to be able to do so.

The warning you get is because you used = rather than =~ in front of the substitution operator. You need:
$cutoff =~ s/e-/*10^(-/;
But that isn't the only problem here. You would also have to eval the statement to get what you wanted, which would not only be a bad design, but completely unnecessary. Perl natively treats values like "1e-12" as numbers, so you should not be doing this with a regex at all. You can simply format the output:
printf '%d',$val;
That will convert 1e-2 to .01. If you need to do create very long numbers like this, look into an appropriate module.

Do you realise that "1e-2" is already a valid format for a number in Perl? you just need to persuade Perl to treat it as a number.
$ perl -E'$x= "1e-2"; say $x'
1e-2
$ perl -E'$x= "1e-2"; $x+=0; say $x'
0.01
Adding zero to it ensures that Perl knows it is a number.

Related

How do I know if there is a number in my list?

My code looks like this:
#!/usr/bin/perl
$counter=0;
#list=<STDIN>;
chomp(#list);
if (#list==~ /^[+-]?\d+$/ )
{
$counter++;
}
print $counter;
So I write datas like: a b c d 1 2
And then it should print 2
beacuse of the 1,2
But it does not matter what datas i write into the list i get back 0.
So what is the problem with my if?
Always use strict; use warnings;. If your goal is to count the number of digit elements in your list, you can use grep to filter the list elements, then apply scalar to get its length (or use $length directly instead of storing the filtered list in #matches):
#!/usr/bin/perl
use strict;
use warnings;
my #list = <STDIN>;
chomp(#list);
my #matches = grep /^[+-]?\d+$/, #list;
print scalar #matches . "\n";
Sample run:
$ ./count.pl
-62
a
b
4
c
+91023
d
3
Honestly, it looks like you just guessed at this syntax. That's not really a great way to write a program :-)
Your main problems are on this line:
if (#list==~ /^[+-]?\d+$/ )
There are two pretty big problems here. Firstly, the #list. The match operator (=~) works on a single string at a time. So you need to use it on a scalar value. And if you give it an array (as you have done here) then Perl will (silently) evaluate the array as a scalar - which means that rather than getting the contents of the array, you'll get the number of elements in the array - which will be an integer so your regex will always match.
But, you say, it doesn't match. Yes, I realise that. And that's down to your second error - you have the match operator wrong. The operator is =~ and you're using ==~ (see the extra =). You would hope that an error like that cause a syntax error, but you've accidentally used a version that is syntactically valid but just doesn't do what you want. Perl interprets your code as:
if (#list = =~ /^[+-]?\d+$/ )
Note the spaces I've added. The one between the two = characters is important. This is saying "perform the match operation and assign the results to #list". But what is the match operator matching against? Well, it hasn't been given an explicit variable to match against, and in that case it matches against the default variable, $_. You haven't put anything into $_ so the match fails.
At this point, I should point out that if you had use warnings in your code, then you would be getting all sorts of useful warnings about what you're doing wrong. All Perl programmers (including the most experienced ones) should always have use strict and use warnings in their code.
There's another point of confusion here. It's the way you read your input.
#list = <STDIN>;
You haven't made it clear, but I suspect that you type in your list all in one line. In that case, you don't want to store your input in an array; you should store it in a scalar.
chomp($list = <STDIN>);
You can then convert it to a list (to be stored in an array) using split().
#list = split /\s+/, $list;
You can then get the count of numbers in your array using grep.
my $count = grep { /^[-+]\d+$/ } #list;
When evaluated as a scalar, grep returns the number of times the block of code was true.
Putting those together (and adding strict and warnings) we get the following:
#!/usr/bin/perl
use strict;
use warnings;
chomp(my $list = <STDIN>);
my $count = grep { /^[-+]\d+$/ } split /\s+/, $list;
Which, to my mind at least, looks simpler than your version.

\x not working inside the substitution

I'm trying to decode the unicode characters. So I simply tried the hexadecimal escape sequence \x{} inside the regex substitution e
use LWP::Simple;
my $k = get("url");
my ($kv) =map{/js_call\(\\"(.+?)\\"\)/} $k;
#now $kv data is https://someurl/call.pl?id=15967737\u0026locale=en-GB\u0026mkhun=ccce
$kv=~s/\\u(.{4})/"\x{$1}"/eg;
I'm trying substitute the all unicode character.
My expected output is:
https://someurl/call.pl?id=15967737&locale=en-GB&mkhun=ccce
Below mentioned print statement gives the expected output. However the regex seems doesn't working properly.
print "\x{0026}";
The problem with s/\\u(.{4})/"\x{$1}"/e is that the backslash escape \x{$1} is evaluated at compile time, which gives a NULL byte:
$ perl -E 'printf "%vX\n", "\x{$1}"'
0
If we escape the backslash in front of x ( s/\\u(.{4})/"\\x{$1}"/ge ) we get a string with literal escape sequences, but still not the desired unicode character:
use feature qw(say);
$kv = '\u0026';
$kv =~ s/\\u(.{4})/"\\x{$1}"/ge;
say $kv;
The output is now:
\x{0026}
With a small modification, you can produce "\x{0026}" instead, which is Perl code you can compile and execute to produce the desired value. To do this, you need involve eval(EXPR).
$kv =~ s/\\u(.{4})/ my $s = eval(qq{"\\x{$1}"}); die $# if $#; $s /ge;
This can be shortened to
$kv =~ s/\\u(.{4})/ qq{"\\x{$1}"} /gee;
Howver, a far better solution is to use the following:
$kv =~ s/\\u(.{4})/chr hex $1/ge;
If you enable use warnings you'll see that the $1 gets evaluated literally before the backreference gets interpolated.
$kv =~ s/\\u(.{4})/ sprintf("\"\\x{%s}\"", $1) /eeg;
sort of works, but it is hideously ugly. I've been trying to simplify it, but the various ideas I tried always got me back to "Illegal hexadecimal digit '$' ignored" warnings.
May be this also you can try:
$kv=~s/\\u([[:xdigit:]]{1,5})/chr(eval("0x$1"))/egis;
Thanks.

Manipulating backreferences for substitution in perl

As a part of an attempt to replace scientific numbers with decimal numbers I want to save a backreference into a string variable, but it doesn't work.
My input file is:
,8E-6,
,-11.78E-16,
,-17e+7,
I then run the following:
open FILE, "+<C:/Perl/input.txt" or die $!;
open(OUTPUT, "+>C:/Perl/output.txt") or die;
while (my $lines = <FILE>){
$find = "(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
$noofzeroesbeforecomma = eval("$7-length($4)");
$replace = '"foo $noofzeroesbeforecomma bar"';
$lines =~ s/$find/$replace/eeg;
print (OUTPUT $lines);
}
close(FILE);
I get
foo bar
foo bar
foo bar
where I would have expected
foo 6 bar
foo 14 bar
foo 7 bar
$noofzeroesbeforecomma seems to be empty or non-existant.
Even with the following adjustment I get an empty result
$noofzeroesbeforecomma = $2;
Only inserting $2 directly in the replace string gives me something (which is then, unfortunately, not what I want).
Can anyone help?
I'm running Strawberry Perl (5.16.1.1-64bit) on a 64-bit Windows 7 machine, and quite inexperienced with Perl
Your main problem is not using
use strict;
use warnings;
warnings would have told you
Use of uninitialized value $7 in concatenation (.) or string at ...
Use of uninitialized value $4 in concatenation (.) or string at ...
I would recommend you try and find a module that can handle scientific notation, rather than trying to hack your own.
Your code, in a working order might look something like this. As you can see, I have put a q() around your eval string to avoid it being evaluated before $7 and $4 exists. I also removed the eval itself, since while double eval on an eval is somewhat excessive.
use strict;
use warnings;
while (my $lines = <DATA>) {
my $find="(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
my $noof = q|$7-length($4)|;
$lines =~ s/$find/$noof/eeg;
print $lines;
}
__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,
Output:
6
14
7
As a side note, not using strict is asking for trouble. Doing it while using a variable name such as $noofzeroesbeforecomma is asking for twice the trouble, as it is rather easy to make typos.
This is not about backreferences but the original problem, transforming numbers from scientific notation. I'm sure there are some cases in which this fails:
#!/usr/bin/env perl
use strict;
use warnings;
use bignum;
for (<DATA>) {
next unless /([+-]?\d+(?:\.\d+)?)[Ee]([+-]\d+)/;
print $1 * 10 ** $2 . "\n";
}
__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,
Output:
0.000008
-0.000000000000001178
-170000000
I suggest you use the Regexp::Common::number plugin for the Regexp::Common module which will find all real numbers for you and allow you to replace those that have an exponent marker
This code shows the idea. using the -keep option makes the module put each component into one of the $N variables. The exponent marker - e or E - is in $7, so the number can be transformed depending on whether this was present
use strict;
use warnings;
use Regexp::Common;
my $real_re = $RE{num}{real}{-keep};
while (<>) {
s/$real_re/ $7 ? sprintf '%.20f', $1 : $1 /eg;
print;
}
output
Given your example input, this code produces the following. The values can be tidied up further using additional code in the substitution
,0.00000800000000000000,
,-0.00000000000000117800,
,-170000000.00000000000000000000,
The problem is that Perl can handle all those types of expressions. And since the standard item of data in Perl is the string, you would only need to capture the expression to use it. So, take this expression:
/(-?\d+(?:.\d+)?[Ee][+-]?\d+)/
to extract it from the surrounding text and use sprintf to format it, like Borodin showed.
However, if it helps you to see a better case of what you tried to do, this works better
my ( $whole, $frac, $expon )
= $line =~ m/(?:,)-?(0|[1-9]\d*)(?:\.(\d*))?[eE]([+\-]?\d+)(?:,)/
;
my $num = $expon - length( $frac );
Why not capture the sign with the exponent anyway, if you're going to do arithmetic with it?
It's better to name your captures and eschew eval when it's not necessary.
The substitution--as is--doesn't make much sense.
Really, since neither the symbols or the digits can be case sensitive, just put a (?i) at the beginning, and avoid the E "character class" [Ee]:
/((?i)-?\d+(?:.\d+)?e[+-]?\d+)/

What regular expression do I use for replacing numbers with leading zeros?

I have a bunch of strings like this:
my $string1 = "xg0000";
my $string2 = "fx0015";
What do I do to increase the number in the string by 1 but also maintain the leading zeros to keep the length of the string the same.
I tried this:
$string =~ s/(\d+)/0 x length(int($1)) . ($1+1)/e;
It doesn't seem to work on all numbers. Is regex what I'm supposet to use to do this or is there a better way?
How about a little perl magic? The ++ operator will work even on strings, and 0000 will magically turn into 0001.
Now, we can't modify $1 since it is readonly, but we can use an intermediate variable.
use strict;
use warnings;
my $string = "xg0000";
$string =~ s/(\d+)/my $x=$1; ++$x/e;
Update:
I didn't think of this before, but it actually works without a regex:
C:\perl>perl -we "$str = 'xg0000'; print ++$str;"
xg0001
Still does not solve the problem DavidO pointed out, with 9999. You would have to decide what to do with those numbers. Perl has a rather interesting solution for it:
C:\perl>perl -we "$str = 'xg9999'; print ++$str;"
xh0000
You can do it with sprintf too, and use the length you compute from the number of digits that you capture:
use strict;
use warnings;
my $string = "xg00000";
foreach ( 0 .. 9 ) {
$string =~ s/([0-9]+)\z/
my $l = length $1;
sprintf "%0${l}d", $1 + 1;
/e;
print "$string\n";
}
This is a really bad task to solve with a regexp. Increasing a number can change an unlimited number of digits, and can in fact also change the number of non-zero digits! Unless you have sworn an oath to use only regexes for a year, use regex to extract the number and then sprintf "%06d" $x+1 to regenerate the new number with the desired width.

Regex: Lookaround && some syntax

I am studying about regular expression and struck with the
lookaround concept
and
with few syntax.
After doing googling, I thought it is a right forum to ask for help.
Please help with this concept.
As I am not good with understanding the explanation.
It will be great if I get plenty of different examples to understand.
For me the modifer /e and || are new in regex please help me in understanding
the real use. Below is my Perl Script.
$INPUT1="WHAT TO SAY";
$INPUT2="SAY HI";
$INPUT3="NOW SAY![BYE]";
$INPUT4="SAYO NARA![BYE]";
$INPUT1=~s/SAY/"XYZ"/e; # /e What is this modifier is for
$INPUT2=~s/HI/"XYZ"/;
$INPUT3=~s/(?<=\[)(\w+)(?=])/ "123"|| $1 /e; #What is '||' is use for and what its name
$INPUT4=~s/BYE/"123"/e;
print "\n\nINPUT1 = $INPUT1 \n \n ";
print "\n\nINPUT2 = $INPUT2 \n \n ";
print "\n\nINPUT3 = $INPUT3 \n \n ";
print "\n\nINPUT4 = $INPUT4 \n \n ";
Have a read of perlrequick and perlretut.
The /e modifier of the s/// substitution operator treats the replacement as Perl code rather than as a string. For example:
$x = "5 10"
$x =~ s/(\d+) (\d+)/$1 + $2/e;
# $x is now 15
Instead of replacing $x with the string "$1 + $2", it evaluates the Perl code $1 + $2 - where $1 is 5 and $2 is 10 - and puts the result into $x.
The || is not a regex operator, it's a normal Perl operator. It is the logical-or operator: if the left-hand side is a true value (not 0 or ''), it returns the left side, otherwise it returns the right side. You can look up perl operators in perlop.
A standard substitution operator looks like this:
s/PATTERN/REPLACEMENT/
Where the PATTERN is matched, it is replaced with REPLACEMENT. REPLACEMENT is treated as a double-quoted string so that you can put variables in there and it will just work.
s/PATTERN/$var1/
You can use this to include pieces of the matched test in your replacement.
s/PA(TT)ERN/$1/
Sometimes, however, this isn't enough. Perhaps you want to process the text and run a subroutine to work out what the replacement is. Here's a really contrived example. Suppose you have text that contains floating point numbers and you want to replace them with integers. A first approach might look like this:
#!/usr/bin/perl
use strict;
use warnings;
$_ = '12.34 5.678';
s/(\d+\.\d+)/int($1)/g;
print "$_\n";
That doesn't work, of course. You end up with "int(12.34) int(5.678)". But that string is a piece of code which you want to run in order to get the correct answer. That's what the /e option does. It treats the replacement string as code, runs it and uses the output as the replacement.
Changing the line in the example above to
s/(\d+\.\d+)/int($1)/ge;
gives us the the required result.
Now that you understand /e I hope that you don't need an explanation of ||. It's just the standard or operator that you use all the time. In your example, it means "the replacement string is either '123' or the contents of $1'. Of course, that doesn't make much sense as '123' is always going to be true, so $1 will never be used. Perhaps you wanted it the other way round - $1 or '123'.